Predicting Bitcoin Price Using Machine Learning


The cryptocurrency market has seen enormous volatility for a couple of years. This volatility is also its biggest asset. It makes people invest heavily often under the expectations that it is the currency of the future.

In the mid of 2017, the price of a Bitcoin was almost half of what it was a year ago, and in two months it reached an all-time peak in December 2017 after which it showed a steep decline. This volatility with large amounts and in short-periods makes it almost impossible to predict its movement. We will try to explore the power of Machine Learning in predicting such a series of data.


Earlier forecasting techniques were mainly econometric models which had strong theoretical foundations viz. Holt-Winters, Vector Autoregression, and ARIMA models. They did not use many variables and were not robust enough for complex datasets.

Traditional techniques fail to scale up in scenarios like the demand spike during promotions or offers, market entry of new products where no historical data is available, economic crashes that affect demand, etc. This is where Machine Learning algorithms can be helpful. These algorithms perform better with more data and increased complexity.

Machine learning paved the way for regression and tree-based models. With the use of various NLP techniques, it is now possible to extract meaningful data from massive datasets like newspapers and web data in order to account for as many features possible to get close to accurate forecasts. With the recent increased interest in the area of deep learning, surprisingly these models perform beyond expectations even on smaller datasets.

Recurrent neural networks have the tendency to learn long-term dependencies. This characteristic was exploited with various architectures involving Long-Short Term Memory Networks (LSTMs) that have been proposed to achieve state-of-the-art results in the domain of time-series forecasting.

LSTMs are trained with backpropagation-in-time and hence they overcome the problem of vanishing gradients. Architectures involving combinations of both Convolutional Neural Networks (CNN) and LSTMs were published which kept beating previously achieved state-of-the-art results.


Predicting bit coin price

The dataset for historical prices of Bitcoin can be downloaded from here. It can be downloaded as a CSV file. The link contains data for prices from January 2012 to January 2018, counting to approximately 3,161,057 data points, each with an interval of one minute and having values of OHLC (Open, High, Low, Close), Volume in BTC and indicated currency, and weighted bitcoin price.

Also, provides an API for directly getting the Bitcoin historical data which is a better way out because it is always updated.

Methodology along with code

Before we start exploring and visualizing our data, it is necessary to clean the dataset available. Cleaning includes getting rid of the null values or irrelevant zero values.

Since bitcoin data is basically time-series data, we first have to perform some data exploratory techniques before forecasting. That includes checking whether the time-series is stationary. This confirms that the series is influenced by factors like seasonality and trends. For that, we first perform a seasonal decomposition of the data to estimate its trend and seasonality. In the case of non-stationarity, we can perform techniques to stationarize the series. Then we check the auto-correlation which is the similarity between the observations as a function of the lags.

Once this is done we are set to forecasting. We first split the dataset into train and test sets. In the method which we will follow we are using the look-back window approach for modeling.

We will predict for ‘n’ number of time-steps based on ‘m’ time steps. In this case, we will choose n and m both to be one, i.e we’ll predict for t by giving input for t-1. Most of the people in the open source community report of ARIMA not giving the best results. Hence we will use more recent approaches like the LSTM with the moving window method.

Thus, we’ll reshape the dataset according to the design of our final architecture. Let’s create a look_back function which will take the prices dataset and reshape it into (n+m)*(total data points – n – m) so that each column is the price for Xt-n, Xt-(n-1)…Xt-1, Yt, Yt+1… Yt+m; where X and Y are the input and ground truth matrices. Thus using the look_back function ‘n’ and ‘m’ become our tuning parameters of the model. We can vary them and can find the optimal one for our particular use case.

Then we will scale the dataset completely because we will be using LSTMs. StandardScaler or MinMaxScaler from sci-kit learn will do the job.

Now we will use a sequence-sequence type of LSTM architecture with multiple LSTMs layers. This way number of layers also becomes a hyperparameter. Our architecture contains ‘n’ number of input LSTM units and ‘m’ number of output LSTMs units.

Training the model


We’ll use Keras wrapper with TensorFlow backend for training. After splitting the dataset into train and test, we construct multiple LSTM layers stacked together with each cell having 256 units and densely connected output with one neuron. We’ll use the Mean Squared Error (MSE) for the loss function. We’ll train with Adam Optimizer as it showed the best convergence.

After testing and tuning the parameters, we’ll see that with one layer of LSTM the model isn’t able to learn all the patterns properly, however with two layers it gives much better predictions. Also, we set n and m to be 1. After a sufficient number of iterations before the model starts to overfit we stop the training and the model can now be used to make predictions. Make sure to inverse the scaling after making the predictions to get them in real scale.

Meanwhile, Tensorflow was launched by Google and is widely used for performing high computations as required by Machine Learning. You can learn machine learning with TensorFlow by opting for this practical guide.


It was observed that although the ARMA model failed to give good results, ARIMA gave better results. However, the LSTM model was able to give the best relative predictions since even the size of the dataset was quite large. Thus it can be seen that for time-series forecasting tasks like that of Bitcoin price predictions, LSTMs are still the best as far as performance is considered.

Code for the Methodology

# importing all the libraries

import numpy as np
import pandas as pd
import statsmodels.api as sm
from datetime import date
from scipy import stats
from sklearn.metrics import mean_squared_error
from math import sqrt
from random import randint
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from keras.layers import GRU
from keras.callbacks import EarlyStopping
from keras import initializers
from matplotlib import pyplot
from datetime import datetime
from matplotlib import pyplot as plt
import plotly.offline as py
import plotly.graph_objs as go
%matplotlib inline

# loading the dataset
data = pd.read_csv('<file_name>.csv')
data.isnull().values.any() #Will print False in case of no null values

data['date'] = pd.to_datetime(data['Timestamp'],unit='s')
group = data.groupby('date')
Daily_Price = group['Weighted_Price'].mean()


df_train= Daily_Price[len(Daily_Price)-days_look-days_from_end:len(Daily_Price)-days_from_train]
df_test= Daily_Price[len(Daily_Price)-days_from_train:]
print(len(df_train), len(df_test))

# checking seasonality
working_data = [df_train, df_test]
working_data = pd.concat(working_data)

working_data = working_data.reset_index()
working_data['date'] = pd.to_datetime(working_data['date'])
working_data = working_data.set_index('date')

 = sm.tsa.seasonal_decompose(working_data.Weighted_Price.values, freq=60)

trace1 = go.Scatter(x = np.arange(0, len(s.trend), 1),y = s.trend,mode = 'lines',name = 'Trend',
    line = dict(color = ('rgb(244, 146, 65)'), width = 4))
trace2 = go.Scatter(x = np.arange(0, len(s.seasonal), 1),y = s.seasonal,mode = 'lines',name = 'Seasonal',
    line = dict(color = ('rgb(66, 244, 155)'), width = 2))

trace3 = go.Scatter(x = np.arange(0, len(s.resid), 1),y = s.resid,mode = 'lines',name = 'Residual',
    line = dict(color = ('rgb(209, 244, 66)'), width = 2))

trace4 = go.Scatter(x = np.arange(0, len(s.observed), 1),y = s.observed,mode = 'lines',name = 'Observed',
    line = dict(color = ('rgb(66, 134, 244)'), width = 2))

data = [trace1, trace2, trace3, trace4]
layout = dict(title = 'Seasonal decomposition', xaxis = dict(title = 'Time'), yaxis = dict(title = 'Price, USD'))
fig = dict(data=data, layout=layout)
py.iplot(fig, filename='seasonal_decomposition')

# checking auto-correlation

ax = plt.subplot(211), lags=48, ax=ax)
ax = plt.subplot(212), lags=48, ax=ax)

df_train = working_data[:-60]
df_test = working_data[-60:]

# look_back window method
def lookback(dataset, look_back=1):
    X, Y = [], []
    for i in range(len(dataset) - look_back):
        a = dataset[i:(i + look_back), 0]
        Y.append(dataset[i + look_back, 0])
    return np.array(X), np.array(Y)

from sklearn.preprocessing import MinMaxScaler

training_set = df_train.values
training_set = np.reshape(training_set, (len(training_set), 1))
test_set = df_test.values
test_set = np.reshape(test_set, (len(test_set), 1))

# scale datasets
scaler = MinMaxScaler()
training_set = scaler.fit_transform(training_set)
test_set = scaler.transform(test_set)

# create datasets which are suitable for time series forecasting
look_back = 1
X_train, Y_train = create_lookback(training_set, look_back)
X_test, Y_test = create_lookback(test_set, look_back)

# reshape datasets so that they will be ok for the requirements of the LSTM model in Keras
X_train = np.reshape(X_train, (len(X_train), 1, X_train.shape[1]))
X_test = np.reshape(X_test, (len(X_test), 1, X_test.shape[1]))

# initialize sequential model, add 2 stacked LSTM layers and densely connected output neuron
model = Sequential()
model.add(LSTM(256, return_sequences=True, input_shape=(X_train.shape[1], X_train.shape[2])))

# compile and fit the model
model.compile(loss='mean_squared_error', optimizer='adam')
history =, Y_train, epochs=100, batch_size=16, shuffle=False,
                    validation_data=(X_test, Y_test),
                    callbacks = [EarlyStopping(monitor='val_loss', min_delta=5e-5, patience=20, verbose=1)])

# add one additional data point to align shapes of the predictions and true labels
X_test = np.append(X_test, scaler.transform(working_data.iloc[-1][0]))
X_test = np.reshape(X_test, (len(X_test), 1, 1))

# get predictions and then make some transformations to be able to calculate RMSE properly in USD
prediction = model.predict(X_test)
prediction_inverse = scaler.inverse_transform(prediction.reshape(-1, 1))
Y_test_inverse = scaler.inverse_transform(Y_test.reshape(-1, 1))
prediction2_inverse = np.array(prediction_inverse[:,0][1:])
Y_test2_inverse = np.array(Y_test_inverse[:,0])


It is to be noted that the information in this article along with the algorithm mentioned is solely for educational or research purposes and is not to be taken as an investment strategy.

If you are fascinated by the concepts of bitcoins and other cryptocurrencies then you can pledge for our latest Kickstarter campaign on Blockchain & Cryptocurrency Training Bootcamp.” It is led by global industry leaders & teaches you all the essence of Blockchain & Cryptocurrency by building various projects. You can check out yourself to see all the covered concepts & numerous benefits that it offers to a student!


Please enter your comment!
Please enter your name here