Building A Logistic Regression in Python

Logistic Regression is a very popular Machine Learning algorithm. Python is widely used for writing Machine Learning programs. In this article, we will learn how to build a Logistic Regression algorithm using a Python machine learning library known as Tensorflow. Below is the Python code for the same.

from sklearn import datasets
import tensorflow as tf
import random
import numpy as np

iris = datasets.load_iris()


LEARNING_RATE = 0.1
BATCH_SIZE = 120
ITERATIONS = 10000

X = iris.data
Y = iris.target

a = range(len(X))
random.shuffle(a)

X_train = []
Y_train = []
X_test = []
Y_test = []

partition = int(0.8 * len(a))

train_indices = a[:partition]
test_indices = a[partition:]

for i in train_indices:
    X_train.append(X[i])
    val = [0, 0, 0]
    val[Y[i]] = 1
    Y_train.append(val)

for i in test_indices:
    X_test.append(X[i])
    val = [0, 0, 0]
    val[Y[i]] = 1
    Y_test.append(val)

n = len(X_train[0])
k = len(Y_train[0])

weight_vector = tf.Variable(tf.random_normal(shape=[n, k]))
constant_term = tf.Variable(tf.random_normal(shape=[1, k]))

sess = tf.Session()
sess.run(tf.global_variables_initializer())

input_data = tf.placeholder(dtype=tf.float32, shape=[None, n])
output_data = tf.placeholder(dtype=tf.float32, shape=[None, k])

output = tf.matmul(input_data, weight_vector) + constant_term

loss_value = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(
    logits=output,
    labels=output_data))

optimizer = tf.train.GradientDescentOptimizer(LEARNING_RATE)
goal = optimizer.minimize(loss_value)

predictions = tf.argmax(output)


for epoch in range(ITERATIONS):

    sess.run(goal, feed_dict={
        input_data: X_train,
        output_data: Y_train
    })
    if epoch % 1000 == 0:
        print(epoch / 1000)
        cnt = 0
        for i in range(len(X_test)):
               if np.argmax(Y_test[i]) == np.argmax(np.array([X_test[i]]).dot(np.array(sess.run(weight_vector))) + np.array(sess.run(constant_term))):
                cnt += 1
        print("Accurate: ", cnt)

correct = 0

for i in range(len(X_test)):
    if np.argmax(Y_test[i]) == np.argmax(np.array([X_test[i]]).dot(np.array(sess.run(weight_vector))) + np.array(sess.run(constant_term))):
        correct += 1

print correct * 100.0 / len(X_test)

Now, let’s understand this code step by step.

from sklearn import datasets
import tensorflow as tf
import random
import numpy as np

The above code snippet imports all the relevant libraries required.

iris = datasets.load_iris()

This line loads the Iris Dataset.

LEARNING_RATE = 0.1
BATCH_SIZE = 120
ITERATIONS = 10000

The above code snippet is used for declaring the constants. You can tweak them and figure out what works best for you.

X = iris.data
Y = iris.target

The above loads the data points of the Iris dataset in X and the labels in Y.

a = range(len(X))
random.shuffle(a)

X_train = []
Y_train = []
X_test = []
Y_test = []

partition = int(0.8 * len(a))

train_indices = a[:partition]
test_indices = a[partition:]

The list ‘a’ stores the list for numbers from 0 to len(X) – 1. It is then shuffled. The variable ‘partition’ is used to decide the percentage of training data and test data from the entire Iris dataset. In this case, the training data comprises of 80% of the entire Iris dataset and the test data comprises of 20% of the entire Iris dataset.

for i in train_indices:
    X_train.append(X[i])
    val = [0, 0, 0]
    val[Y[i]] = 1
    Y_train.append(val)

The above code snippet prepares the training data set in the format that will be used further in the code.

for i in test_indices:
X_test.append(X[i])
val = [0, 0, 0]
val[Y[i]] = 1
Y_test.append(val)

The above code snippet prepares the test data set in the format that will be used further in the code.

n = len(X_train[0])
k = len(Y_train[0])

Here, ‘n’ denotes the number of features in a data point and ‘k’ denotes the total number of classes (which is equal to 3 in the case of Iris dataset).

weight_vector = tf.Variable(tf.random_normal(shape=[n, k]))
constant_term = tf.Variable(tf.random_normal(shape=[1, k]))

The above statements are for declaring the Tensorflow variables for weights (weight_vector) and the bias (constant_term).

sess = tf.Session()
sess.run(tf.global_variables_initializer())

Here, we start a Tensorflow session and initialize all the variables.

input_data = tf.placeholder(dtype=tf.float32, shape=[None, n])
output_data = tf.placeholder(dtype=tf.float32, shape=[None, k])

In the above statements, we declare Tensorflow placeholders for the input data and output data.

output = tf.matmul(input_data, weight_vector) + constant_term

The computation of the matrix multiplication of the input data vector and the weights vector takes place and then the bias is added to the same.

loss_value = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(
logits=output,
labels=output_data))

Here, the loss is calculated using the Sigmoid Cross Entropy. The aim will be to minimize this loss.

optimizer = tf.train.GradientDescentOptimizer(LEARNING_RATE)
goal = optimizer.minimize(loss_value)

In the above statements, we determine the optimizer for minimizing the loss, which, in this case, is the Gradient Descent Optimizer.

predictions = tf.argmax(output)

We now calculate the predictions of our Logistic Regression algorithm.

for epoch in range(ITERATIONS):

    sess.run(goal, feed_dict={
        input_data: X_train,
        output_data: Y_train
    })
    if epoch % 1000 == 0:
        print(epoch / 1000)
        cnt = 0

        for i in range(len(X_test)):
               if np.argmax(Y_test[i]) == np.argmax(np.array([X_test[i]]).dot(np.array(sess.run(weight_vector))) + np.array(sess.run(constant_term))):
                cnt += 1
        print("Accurate: ", cnt)

In the above code snippet, we are actually running the Tensorflow session by giving the training dataset to the ‘feed_dict’. For every 1000th iteration, we are printing the number of correct predictions made by the algorithm.

correct = 0

for i in range(len(X_test)):
    if np.argmax(Y_test[i]) == np.argmax(np.array([X_test[i]]).dot(np.array(sess.run(weight_vector))) + np.array(sess.run(constant_term))):
        correct += 1

print correct * 100.0 / len(X_test)

In the above code snippet, we are trying to estimate the accuracy of our training Logistic Regression model by running it on the test dataset and obtaining the count of the correct predictions.

Logistic Regression is an interesting algorithm and is widely used in Machine Learning. Hope you found this article helpful. Keep exploring the depths of Machine Learning!

Previous articleHow AWS Cloud Front Works

Next articleWhat do mean by GitLab?

Building A Logistic Regression in Python

LEAVE A REPLY Cancel reply

Exclusive content

Data Science for Everyone: How it Can Benefit Your Career (No PhD Required).

Is Your Business Stuck in the Stone Age? It’s Time to Cloud Up!

Why Developers Are Migrating to the Cloud

Latest article

Data Science for Everyone: How it Can Benefit Your Career (No PhD Required).

Is Your Business Stuck in the Stone Age? It’s Time to Cloud Up!

Why Developers Are Migrating to the Cloud

More article

Data Science in finance: How is Your Industry Using Data?

Data Science for Everyone: How it Can Benefit Your Career (No PhD Required).

Is Your Business Stuck in the Stone Age? It’s Time to Cloud Up!

Why Developers Are Migrating to the Cloud