Internet of ThingsBuilding a simple NLP based Chatbot in Python

Building a simple NLP based Chatbot in Python

As per a Gartner research, 80% of the CEOs believe that customer service is going to be the most important factor that will differentiate a brand from its competitors. Basically, this means that the top CEOs are betting that in order to get ahead of their competitors, they need to improve their customer service. In this blog post, we will take a look at a NLP based Chatbot built from scratch in Python.

Currently, the way customer service is done is via call centers – customers can call up the call center, get on the IVR and then eventually reach a human after waiting for a long time. This is not only irritating for the customers (because they have to wait), but also expensive for the organizations because the cost of running a call center is just too high.

Companies therefore are looking for ways to improve their customer service. One of the ways they are achieving this is to go digital – driving the customers to website rather than the call center. Maintaining website is easier and far more economical as compared to management of a call center. To ensure that the customers are easily able to navigate across the pages of a website, companies have come up with an interesting tool called as a ChatBot.

ChatBot is an extension of a “chat box” where instead of a customer chatting with a human (call center or customer service representative), the customer is rather chatting with a computer that tends to understand what the customer is typing and based on that, send out a response. ChatBots are widely used by organizations like banks, insurance companies, etc owing to the fact that they have a huge number of customers and therefore, spending money on call center becomes uneconomical.

There are 2 most popular Machine Learning libraries in Python:

Keras: it is an excellent library for building powerful Neural Networks in Python
Scikit Learn: it is a general purpose Machine Learning library in Python

We will be using Keras for our purpose. Let us start writing actual code now.

Let’s get started and write actual code to build a simple NLP based Chatbot.

from keras.models import Sequential
from keras.losses import categorical_crossentropy
from keras.optimizers import SGD
from keras.layers import Dense

from numpy import argmax
import numpy as np
import re

Here, we’ve uploaded some numpy packages as well. For instance, the argmax function in numpy helps in finding the index of the largest number in a vector.

Let us now build some dummy training data for our ChatBot. We have some sentences in the vector X. It looks like the following.

X = ['Hi',
     'Hello',
     'How are you?',
     'I am studying',
     'studying',
     'see you later',
     'bye',
     'goodbye']

Observe that the first 3 sentences are sentences that are in greeting mood. The next 2 talk about studying mood. Finally, the last 3 talk about a bye mood. greeting, studying and bye are essentially the “intents” of these sentences. An intent indicates the “meaning” of the sentence. Corresponding to each of the above sentence in X we have an intent vector Y as follows:

Y = ['greeting',
     'greeting',
     'greeting',
     'studying',
     'studying',
     'bye',
     'bye',
     'bye']

The idea of a ChatBot is fairly simple – from the sentence, we identify the intent using Machine Learning. Later, for each intent, we have a predefined answer. For instance, for the intent greeting, we may have an answer – hey, what’s up? such that whenever someone mentions things related to greeting, this is the answer that is sent.

Now, the question to be answered is as follows – given a sentence, how do we identify its intent? This is a classic Machine Learning problem – sentence classification. The way it works is as follows:

  • We create a training data (X and Y as above) which contains a list of sentences along with their intents.
  • Now, a classifier is fit on this training data. This classifier could be as simple as a multi-class logistic classifier or as complex as a deep Neural Network.
  • To classify a sentence, we first convert each sentence into a vector of numbers. This is achieved by creating a frequency based encoding of the sentence. Basically, we first create a vocabulary and then map each sentence to a vector which talks about the frequency of each word.

To give you an example on the last point – suppose we have the following vocabulary: {a, the, hi, hello, you, are, hey, it, how}. In such a situation, for the sentence “hi, how are you”, the frequency based encoding would be: {0, 0, 1, 0, 1, 1, 0, 0, 1}

This indicates that the first word “a” is present 0 times, the second word “the” is present 0 times, the 3rd word “hi” is present 1 time and so on.

However, before we get into frequency based encoding, let us first do some preprocessing on the sentence by removing non alphanumeric characters:

def remove_non_alpha_numeric_characters(sentence):
    new_sentence = ''
    for alphabet in sentence:
        if alphabet.isalpha() or alphabet == ' ':
            new_sentence += alphabet
    return new_sentence

Many other preprocessing mechanisms are applied on each sentence. For instance:

  • Convert it to all lower case.
  • Remove leading and trailing spaces.
  • Remove multiple spaces from the sentence.

Take a look at the preprocess_data function that does all of this.

def preprocess_data(X):
    X = [data_point.lower() for data_point in X]
    X = [remove_non_alpha_numeric_characters(
        sentence) for sentence in X]
    X = [data_point.strip() for data_point in X]
    X = [re.sub(' +', ' ',
                data_point) for data_point in X]
    return X

Here, let us create some vocabulary for the frequency based encoding. As a simple example, we can take the vocabulary as the union of all words present across all the sentences in the training data.

X = preprocess_data(X)

vocabulary = set()
for data_point in X:
    for word in data_point.split(' '):
        vocabulary.add(word)

vocabulary = list(vocabulary)

Now, let us encode each of the sentence present in the training data as per the frequency based encoding.

X_encoded = []

def encode_sentence(sentence):
    sentence = preprocess_data([sentence])[0]
    sentence_encoded = [0] * len(vocabulary)
    for i in range(len(vocabulary)):
        if vocabulary[i] in sentence.split(' '):
            sentence_encoded[i] = 1
    return sentence_encoded

X_encoded = [encode_sentence(sentence) for sentence in X]

Similarly, let us also encode the labels (the intents). For instance, let us call greeting as [1, 0, 0], studying as [0, 1, 0] and bye as [0, 0, 1].

classes = list(set(Y))

Y_encoded = []
for data_point in Y:
    data_point_encoded = [0] * len(classes)
    for i in range(len(classes)):
        if classes[i] == data_point:
            data_point_encoded[i] = 1
    Y_encoded.append(data_point_encoded)

Let us now prepare our data for training by a simple Neural Network written in keras. The train and the test data can (should) be separate.

X_train = X_encoded
y_train = Y_encoded
X_test = X_encoded
y_test = Y_encoded

Here, we are building a simple sequential Neural Network model with sigmoid activation. The hidden layer here comprises of 64 nodes. We are using categorical cross entropy as the cost function.

model = Sequential()
model.add(Dense(units=64, activation='sigmoid',
                input_dim=len(X_train[0])))
model.add(Dense(units=len(y_train[0]), activation='softmax'))
model.compile(loss=categorical_crossentropy,
              optimizer=SGD(lr=0.01,
                            momentum=0.9, nesterov=True))
model.fit(X_train, y_train, epochs=100, batch_size=16)

Let us now generate the predictions of the model.

predictions = [argmax(pred) for pred in model.predict(X_test)]

Finally, let us measure the accuracy of the predictions on the test data.

correct = 0
for i in range(len(predictions)):
    if predictions[i] == argmax(y_test[i]):
        correct += 1

print "Correct:", correct
print "Total:", len(predictions)

Finally, let us create a ChatBot prompt where the user will be able to type in a sentence which the ChatBot understands and prints the intent. Eventually, we can map the intent to a ChatBot reply which can be sent out to the user.

while True:
    print "Enter a sentence"
    sentence = raw_input()
    prediction= model.predict(np.array([encode_sentence(sentence)]))
    print classes[argmax(prediction)]

The above simple code for ChatBot gives an accuracy of over 90%. The following activities could be performed to increase the number:

  • Training on more data: this is by far the best method to increase accuracy of a ChatBot. the more the data the ChatBot sees, the better is it able to learn and generalize, resulting in higher accuracy.
  • Using a deeper Neural Network: the deeper the network is, the more the features that can be extracted. This generally results in a better performance. In some cases, performance degradation could also be seen. However, in most cases, accuracy increases.

2 COMMENTS

  1. Will receive an error: “Error when checking model input: the list of Numpy arrays that you are passing to your model is not the size the model expected.” unless you change model.fit(X_train, y_train, epochs=100, batch_size=16) to model.fit(np.array(X_train), np.array(y_train), epochs=100, batch_size=16).

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Exclusive content

- Advertisement -

Latest article

21,501FansLike
4,106FollowersFollow
106,000SubscribersSubscribe

More article

- Advertisement -