The cryptocurrency market has seen enormous volatility for a couple of years. This volatility is also its biggest asset. It makes people invest heavily often under the expectations that it is the currency of the future.

In the mid of 2017, the price of a Bitcoin was almost half of what it was a year ago, and in two months it reached an all-time peak in December 2017 after which it showed a steep decline. This volatility with large amounts and in short-periods makes it almost impossible to predict it’s movement. We will try to explore the power of Machine Learning in predicting such a series of data.

### Methods

Earlier forecasting techniques were mainly econometric models which had strong theoretical foundations viz. Holt-Winters, Vector Autoregression and ARIMA models. They did not use many variables and were not robust enough for complex datasets.

Traditional techniques fail to scale up in scenarios like the demand spike during promotions or offers, market entry of new products where no historical data is available, economic crashes which affect demand, etc. This is where Machine Learning algorithms can be helpful. These algorithms perform better with more data and increased complexity.

Machine learning paved the way for regression and tree-based models. With the use of various NLP techniques, it is now possible to extract meaningful data from massive datasets like newspapers and web data in order to account for as many features possible to get close to accurate forecasts. With the recent increased interest in the area of deep learning, surprisingly these models perform beyond expectations even on smaller datasets.

Recurrent neural networks have the tendency to learn long-term dependencies. This characteristic was exploited and various architectures involving Long-Short Term Memory Networks (LSTMs) have been proposed to achieve state-of-the-art results in the domain of time-series forecasting.

LSTMs are trained with backpropagation-in-time and hence they overcome the problem of vanishing gradients. Architectures involving combinations of both Convolutional Neural Networks (CNN) and LSTMs were published which kept beating previously achieved state-of-the-art results.

### Dataset

The dataset for historical prices of Bitcoin can be downloaded from here. It can be downloaded as a CSV file. The link contains data for prices from January 2012 to January 2018, counting to approximately 3,161,057 data points, each with an interval of one minute and having values of OHLC (Open, High, Low, Close), Volume in BTC and indicated currency, and weighted bitcoin price.

Also, CoinRanking.com provides an API for directly getting the Bitcoin historical data which is a better way out because it is always updated.

### Methodology along with code

Before we start exploring and visualizing our data, it is necessary to clean the dataset available. Cleaning includes getting rid of the null values or irrelevant zero values.

Since bitcoin data is basically a time-series data, we first have to perform some data exploratory techniques before forecasting. That includes checking whether the time-series is stationary. This confirms that the series is influenced by factors like seasonality and trends. For that, we first perform a seasonal decomposition of the data to estimate its trend and seasonality. In the case of non-stationarity, we can perform techniques to stationarize the series. Then we check the auto-correlation which is the similarity between the observations as a function of the lags.

Once this is done we are set to forecasting. We first split the dataset into train and test sets. In the method which we will follow we are using the look-back window approach for modeling.

We will predict for ‘n’ number of time-steps based on ‘m’ time steps. In this case, we will choose n and m both to be one, i.e we’ll predict for *t *by giving input for *t-1*. Most of the people in the open source community report of ARIMA not giving the best results. Hence we will use more recent approaches like the LSTM with the moving window method.

Thus, we’ll reshape the dataset according to the design of our final architecture. Let’s create a *look_back* function which will take the prices dataset and reshape it into (n+m)*(total data points – n – m) so that each column is the price for Xt-n, Xt-(n-1)…Xt-1, Yt, Yt+1… Yt+m; where X and Y are the input and ground truth matrices. Thus using the look_back function ‘n’ and ‘m’ become our tuning parameters of the model. We can vary them and can find the optimal one for our particular use case.

Then we will scale the dataset completely because we will be using LSTMs. StandardScaler or MinMaxScaler from sci-kit learn will do the job.

Now we will use a sequence-sequence type of LSTM architecture with multiple LSTMs layers. This way number of layers also becomes a hyperparameter. Our architecture contains ‘n’ number of input LSTM units and ‘m’ number of output LSTMs units.

### Training the model

We’ll use Keras wrapper with TensorFlow backend for training. After splitting the dataset into train and test, we construct multiple LSTM layers stacked together with each cell having 256 units and densely connected output with one neuron. We’ll use the Mean Squared Error (MSE) for the loss function. We’ll train with the Adam Optimizer as it showed the best convergence.

After testing and tuning the parameters, we’ll see that with one layer of LSTM the model isn’t able to learn all the patterns properly, however with two layers it gives much better predictions. Also, we set n and m to be 1. After a sufficient number of iterations before the model starts to overfit we stop the training and the model can now be used to make predictions. Make sure to inverse the scaling after making the predictions to get them in real scale.

Meanwhile, Tensorflow was launched by Google and is widely used for performing high computations as required by Machine Learning.

### Results

It was observed that although the ARMA model failed to give good results, ARIMA gave better results. However, the LSTM model was able to give the best relative predictions since even the size of the dataset was quite large. Thus it can be seen that for time-series forecasting tasks like that of Bitcoin price predictions, LSTMs are still the best as far as performance is considered.

### Code for the Methodology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 | # importing all the libraries import numpy as np import pandas as pd import statsmodels.api as sm from datetime import date from scipy import stats from sklearn.metrics import mean_squared_error from math import sqrt from random import randint from keras.models import Sequential from keras.layers import Dense from keras.layers import LSTM from keras.layers import GRU from keras.callbacks import EarlyStopping from keras import initializers from matplotlib import pyplot from datetime import datetime from matplotlib import pyplot as plt import plotly.offline as py import plotly.graph_objs as go py.init_notebook_mode(connected=True) %matplotlib inline # loading the dataset data = pd.read_csv('<file_name>.csv') data.isnull().values.any() #Will print False in case of no null values data['date'] = pd.to_datetime(data['Timestamp'],unit='s').dt.date group = data.groupby('date') Daily_Price = group['Weighted_Price'].mean() Daily_Price.head() df_train= Daily_Price[len(Daily_Price)-days_look-days_from_end:len(Daily_Price)-days_from_train] df_test= Daily_Price[len(Daily_Price)-days_from_train:] print(len(df_train), len(df_test)) # checking seasonality working_data = [df_train, df_test] working_data = pd.concat(working_data) working_data = working_data.reset_index() working_data['date'] = pd.to_datetime(working_data['date']) working_data = working_data.set_index('date') = sm.tsa.seasonal_decompose(working_data.Weighted_Price.values, freq=60) trace1 = go.Scatter(x = np.arange(0, len(s.trend), 1),y = s.trend,mode = 'lines',name = 'Trend', line = dict(color = ('rgb(244, 146, 65)'), width = 4)) trace2 = go.Scatter(x = np.arange(0, len(s.seasonal), 1),y = s.seasonal,mode = 'lines',name = 'Seasonal', line = dict(color = ('rgb(66, 244, 155)'), width = 2)) trace3 = go.Scatter(x = np.arange(0, len(s.resid), 1),y = s.resid,mode = 'lines',name = 'Residual', line = dict(color = ('rgb(209, 244, 66)'), width = 2)) trace4 = go.Scatter(x = np.arange(0, len(s.observed), 1),y = s.observed,mode = 'lines',name = 'Observed', line = dict(color = ('rgb(66, 134, 244)'), width = 2)) data = [trace1, trace2, trace3, trace4] layout = dict(title = 'Seasonal decomposition', xaxis = dict(title = 'Time'), yaxis = dict(title = 'Price, USD')) fig = dict(data=data, layout=layout) py.iplot(fig, filename='seasonal_decomposition') # checking auto-correlation plt.figure(figsize=(15,7)) ax = plt.subplot(211) sm.graphics.tsa.plot_acf(working_data.Weighted_Price.values.squeeze(), lags=48, ax=ax) ax = plt.subplot(212) sm.graphics.tsa.plot_pacf(working_data.Weighted_Price.values.squeeze(), lags=48, ax=ax) plt.tight_layout() plt.show() df_train = working_data[:-60] df_test = working_data[-60:] # look_back window method def lookback(dataset, look_back=1): X, Y = [], [] for i in range(len(dataset) - look_back): a = dataset[i:(i + look_back), 0] X.append(a) Y.append(dataset[i + look_back, 0]) return np.array(X), np.array(Y) from sklearn.preprocessing import MinMaxScaler training_set = df_train.values training_set = np.reshape(training_set, (len(training_set), 1)) test_set = df_test.values test_set = np.reshape(test_set, (len(test_set), 1)) # scale datasets scaler = MinMaxScaler() training_set = scaler.fit_transform(training_set) test_set = scaler.transform(test_set) # create datasets which are suitable for time series forecasting look_back = 1 X_train, Y_train = create_lookback(training_set, look_back) X_test, Y_test = create_lookback(test_set, look_back) # reshape datasets so that they will be ok for the requirements of the LSTM model in Keras X_train = np.reshape(X_train, (len(X_train), 1, X_train.shape[1])) X_test = np.reshape(X_test, (len(X_test), 1, X_test.shape[1])) # initialize sequential model, add 2 stacked LSTM layers and densely connected output neuron model = Sequential() model.add(LSTM(256, return_sequences=True, input_shape=(X_train.shape[1], X_train.shape[2]))) model.add(LSTM(256)) model.add(Dense(1)) # compile and fit the model model.compile(loss='mean_squared_error', optimizer='adam') history = model.fit(X_train, Y_train, epochs=100, batch_size=16, shuffle=False, validation_data=(X_test, Y_test), callbacks = [EarlyStopping(monitor='val_loss', min_delta=5e-5, patience=20, verbose=1)]) # add one additional data point to align shapes of the predictions and true labels X_test = np.append(X_test, scaler.transform(working_data.iloc[-1][0])) X_test = np.reshape(X_test, (len(X_test), 1, 1)) # get predictions and then make some transformations to be able to calculate RMSE properly in USD prediction = model.predict(X_test) prediction_inverse = scaler.inverse_transform(prediction.reshape(-1, 1)) Y_test_inverse = scaler.inverse_transform(Y_test.reshape(-1, 1)) prediction2_inverse = np.array(prediction_inverse[:,0][1:]) Y_test2_inverse = np.array(Y_test_inverse[:,0]) |

*Disclaimer*

*It is to be noted that the information in this article along with the algorithm mentioned is solely for educational or research purposes and is not to be taken as an investment strategy.*