Machine Learning is progressing at a rapid pace. Google recently announced Google Duplex, where an artificially intelligent voice assistant actually makes a call on your behalf to book a service such as an appointment with a doctor, or a reservation at a restaurant. Computers are becoming smarter day-by-day due to continuous research in Machine Learning and Artificial Intelligence.
Broadly, there are 3 domains where Machine Learning is being applied extensively and the results are phenomenal. These are:
- Automated Speech Recognition: Automated Speech Recognition, also called as ASR is being extensively applied in various languages to understand what humans are saying. For instance, if you take a look at your smartphone, you will observe that practically all applications can be used with the voice button on the keyboard. You can simply press the voice button and speak and the smartphone converts your speech into text. These days, a lot of research is being done on multilingual automated speech recognition systems.
- Natural Language Processing: Natural Language Processing is being extensively applied to understand the text that is obtained from the automated speech recognition system. For instance, the Google voice assistant in your Android phone, or, Siri in your iPhone is powered by cutting-edge Natural Language Processing where these voice assistants understand what you are talking about and then automatically trigger an action. Amazon has also entered into the league of voice assistants by launching the Alexa-powered Echo device.
- Computer Vision: This is an interesting area where a lot of Machine Learning research is happening. Computers are being trained to identify various objects (Google lens), places and even faces. An example of Computer Vision which you probably would have seen is the automatic tagging of photos by Facebook. These automatic tagging systems are powered by state-of-the-art Face Recognition systems which is what we’re going to talk about in this blog post.
What is a Face Recognition System?
A system that is capable of identifying a particular person from a photograph is called as a Face Recognition System. Face Recognition Systems come in 2 variants:
- Face matching: Such Face Recognition Systems tend to answer a simple question – given 2 photographs, do they represent the same person? These systems are used in e-KYC (electronic know your customer) where the applicant is asked to upload his/her photograph through webcam and a photo identity proof. The Face Recognition System then emits a confidence score which indicates the probability of the 2 faces (the photo from the webcam and the photo in the identity proof) being similar.
- Face tagging: Such Face Recognition Systems are used to label each person given a photograph that contains multiple faces. A classic example is Facebook photo tagging as discussed above. Face tagging in videos is being used to categorize customers as and when they enter a store. For instance, some banks in Singapore are using Face Recognition systems to identify high net-worth customers entering a particular branch and giving them special service for being valuable customers of the bank. In China, a criminal was caught simply because a Face Recognition System installed at a mall identified the person’s face and raised an alarm.
How do Face Recognition AI Systems work?
Face Recognition Systems employ Machine Learning on the image. These days, deep Neural Networks are being used to train highly accurate Face Recognition Systems. In order to understand how a Face Recognition System works, let us first understand the concept of a feature vector.
A feature vector is nothing but a mapping of an object with an array of numbers where each number in the array represents some value of the object. Let us take an example. If you were to describe the face of someone, what all features can you think of? Here are some ideas:
- The color of the face: we can identify the color by 3 values – R, G and B indicating red, green and blue. Alternatively, we can use the hex color code to identify the color of the face.
- Width and height of the face: phrases like “long face” and “broad face” can be quantified by rather providing 2 number – width and height.
- Color of hair
- Spectacles: whether or not the person is wearing spectacles.
- Facial hair: does the person have a mustache/beard?
We can represent these features using an array that could look something like this:
Face color (red)
Face color (green)
Face color (blue)
Hair color (red)
Hair color (green)
Hair color (blue)
As you can see, each face will have its own feature vector which may or may not be unique. As and when we add more features, we can probably capture more diversity in the faces.
Now that we know about a feature vector, the understanding of a Face Recognition System becomes fairly simple. A Face Recognition System maintains a corpus of faces where for each face, it stores the feature vector obtained from the sample image. Whenever a new face is given to the Face Recognition System for identification, it tries to find the distance between this new feature vector with each of the faces in the corpus. The face with the lowest score wins provided that the distance of it from the given face is up to a certain threshold.
To explain it via a simple program:
function RecognizeFace(new_face, corpus):
closest = None
for face_i in corpus:
if distance(new_face, face_i) < distance(new_face, closest):
closest = face_i
if distance(new_face, closest) < THRESHOLD_DISTANCE:
As can be seen in the code above, iteration is done over each face and then the closest face is picked up. If this face is within a certain threshold, it is declared to be the recognized face.
How does Machine Learning come into the picture? Machine Learning is applied at 2 places:
- Creation of the feature vector: it is tough to create a “good” feature vector by identifying various features manually. Deep Neural Networks take the image as input and use a series of non-linear transformations to generate a feature vector out of it. Well-trained Neural Networks can identify highly complex features that humans cannot even think of.
- Identifying the distance metric: Neural Networks can be tuned to create various distance metrics depending on the training dataset.
A lot of research is going on to develop algorithms that can generate sophisticated feature vectors. For instance, the pioneer of Deep Learning – Geoffrey Hinton recently published a paper in Capsule Neural Networks which is said to revolutionize Face Recognition Algorithms. Many top organizations conduct global level competitions where participants present unique ideas on how better Face Recognition Systems can be developed. The domain is quite interesting and has a long way to go.