“We need to be super careful with AI. Potentially more dangerous than nukes. I’m increasingly inclined to think there should be some regulatory oversight [of AI], maybe at the national and international level.”
-From Elon Musk’s Twitter
As much as we glorify AI and its applications in the modern era, we cannot deny the unethical use it enables or might enable in future; and if we dive a little deeper into the possibilities AI offers to ethically flexible people, Mr. Musk’s statement sounds much more concerning.
Of course, all coins have two sides. You can use your mobile application development skills to help others and as well as develop addictive and harmful apps or games. The same goes with Artificial Intelligence, whereas a lot of researchers are using it to help us by building AI-powered smart Chatbots, there are also some who are building applications like DeepFake.
After the introduction of the GAN network in the deep learning community, the hype of FAKE vs. ORIGINAL data is on its peak, and Deepfake is just a part of the story.
What is DeepFake
DeepFake: It’s used to change the existing scenes in videos to create the new and fake but realistic-looking scenes. It is built using computer vision techniques like Convolutional Neural network(CNN) and Generative Adversarial Network(GAN).
The name DeepFake is the combination of the word “Deep” which comes from Deep Learning and “Fake” given because of the output it produces.
What it does: It accepts some input in the form of audio, video (Original) and tweaks them by superimposing the other input (Fake) to produce the fake but realistic-looking original videos. The output video it produces is harder to detect even by humans.
In layman’s term, it’s exactly like doing things like swapping the faces in the video, changing the facial expressions, changing the speech of spokesperson, altering the hand-gestures profoundly. If you are not yet convinced about how terrifying this technology is, just have a look at the deepfaked Mark Zuckerberg video.
How Deepfake Works
Let us make it simple what Deepfake does: Face swapping in videos.
You have the original video clips of Brack Obama and you want to replace his face with Donald Trump. To do this, first of all, you need to gather enough images of both Obama and Trump. There should be diversity in images. After collecting the data, we will move onto the technical part of this technology.
Now, we have a bunch of images of both. We will use a convolutional neural network to process the image and compress the picture volume (width x height x depth) into the one-dimensional vector. This architecture is known as Encoder because it encodes the information of the picture into just a single vector. Now, we will connect the decoder with the one-dimensional vector to recreate the picture using only the encoded picture. Below attached picture from Wikipedia will make it more clear.
Now, what you expect from the decode? You guessed it right. It should create the output image as close as possible to the input image.
We will separate the images into two parts: one of Obama and another one of Trump.
For the face-swapping problem, we will have one encoder and two decoder architectures.
The encoder architecture will be the same for both sets of images. It will encode the information of images into a one-dimensional vector. The first decoder will be used to decode the one-dimensional vector of Obama images to recreate them and the other one will be used to recreate the images of Trump from the one-dimensional vector of Trump images. This process of training the network on millions of images is known as the Training process. In this process, the network learns the hidden pattern to map the input to the output.
After the training process, the fun part begins. It’s time to swap the faces.
Now, you will pick up your favorite video clip of Brack Obama and extract all of the frames from it, Basically, rather than video, now you will be left with a sequence of images.
We will use the same trained encoder architecture to encode the information of Obama’s original image but will use the decoder architecture of Trump to construct the image of Trump in place of Obama. This is how you swap faces in pictures.
Now, you just have to combine those individual new face swapped images to create the new Deepfake video.
This is how all of the pieces of DeepFake technology works in essence.
Why Deepfake Matters More Now
You need to understand our economy, the way of working, and the way we learn and communicate are all changing. We are living in the Information Era, where data is more valuable than gold. The industrial age is no more. Our preferable form of input data is changing. A shift is happening right in front of your eyes, and you need to observe the change.
Vlogging is more preferable than blogging now.
People are becoming more used to learning from animated videos rather than boring textbooks.
Applications like TikTok are used widely.
Images and Videos are everywhere. They are a new form of communication now. They are and will be used to communicate messages across communities.
News will be shared using video clips.
Video clips are used as evidence.
But now we have technology like DeepFake which produces fake but realistic-looking videos. So, there is a significant threat to our new way of communication and learning.
The Good Side of Deepfake
Deepfake technology can be heavily used in the Film Industry. It can be used to edit the dialogue by the film actors, which can be used to alter the existing facial expressions which are hard to be done by an actor.
More than this, stunts can be performed by other guys and after that their faces can be swapped by the real actors of the films.
It is not just limited to the film industry; this technology can be used anywhere in any situation which involves video editing, which is hard to do by humans manually.
The Bad Side of Deepfake
The essence of Deepfake’s working: you need the massive amount of data of both the personalities to create realistic-looking videos.
Politicians and Celebrities are the primary victim and targets of this technology. Why so? Because there are lots of videos and audio clips present out there of these people on the internet which will help to create realistic fake videos. And this is the also reason behind: why there are not Deepfakes of common people on the internet.
Data is the goldmine in this information age. The more data you’ve, the more chances you’ve to win the game.
What Can Be Done Now?
After photoshop became popular, we learned not to believe in any image we come across on the internet. We learned to be more responsible and fact-check things we see in social media. The same goes for Deepfake. Fake videos and fake Whatsapp forwards can easily manipulate our views and spread false propaganda. We have to take the responsibility to double-check suspicious videos before believing or sharing them.
This seems like a temporary solution. We need to look beyond for a more robust and trustworthy solution for the further development of Artificial Intelligence. The solution which will prevent the introduction of these severe use cases of AI.
There should be proper law and order related to these lousy use cases of AI. We should start regulating AI.