Implementing Face ID for iPhone X with Python and Deep Learning

Last year, Apple's latest product, the iPhone X, was the most talked-about for its new unlocking feature — FaceID, which replaced TouchID.

Apple has made a full-screen phone (although it is the "notch" that has been criticized and followed suit), so it has to develop a corresponding method that can easily and quickly unlock the phone.

While Apple's competitors were still tweaking the location of the fingerprint sensor, Apple revolutionized the way you unlock your phone -- you just need to look at the phone. Through a small and advanced front-facing depth camera, the iPhone X can create a 3D image of the user's face. In addition, the infrared camera will also take a photo of the user's face to adapt to changes in light and color in different environments. Through deep learning, the phone can learn the user's face and recognize it every time the user lifts the phone. Surprisingly, Apple claims the technology is more secure than TouchID, with a one-in-a-million error rate.

With this revolutionary technology launched by Apple mobile phones, more and more mobile phones using face unlocking technology have appeared on the market, and face unlocking has become a new technology trend.

So what are the secrets behind the iPhone X's FaceID? As ordinary developers, can we implement it ourselves? A guy named Norman Di Palo, who majored in artificial intelligence at the University of Rome, has been pondering this problem recently. Finally, after some exploration, he reversed this technology with the help of deep learning technology and Python programming. Let's hear how he does it.


I (author Norman Di Palo - translator's note) are very interested in Apple's technology for implementing FaceID - everything runs on the device, and a little training on the user's face will smooth every time the phone is lifted run.

My focus is on how to implement this process using deep learning and how to optimize each step. In this article, I'll show how to implement a FaceID-like algorithm using Keras. I will explain in detail the various architectural decisions I made and present the final experimental results. I used the Kinect, a popular RGB and depth camera, which outputs similar results to the front-facing camera on the iPhone X, just a bit larger. Grab a cup of tea and sit back while we reverse-engineer Apple's new industry-changing feature.

Understanding FaceID

The first step is to analyze how FaceID works on the iPhone X. The iPhone X white paper can help us understand the basic workings of FaceID. When using TouchID, the user needs to touch the sensor several times to record the fingerprint. After 15 to 20 touches, TouchID is set up. FaceID is similar, users need to enter facial data first. The process is simple: the user simply looks straight at the phone as usual, then slowly turns his head to complete the omnidirectional acquisition.

It's that simple, FaceID is set up to unlock your phone. The incredible speed of the entry process can tell us a lot about the algorithm behind it. For example, the neural network that drives FaceID isn't just doing classification, as I'll explain further next.

For a neural network, classification means learning to predict whether the face it sees is the user's. So, in simple terms, it needs to predict "yes" or "no" with training data. But unlike many other deep learning use cases, this approach doesn't work here. First, the neural network needs to train the model from scratch with newly obtained data from the user's face, which requires a lot of time, computing resources, and different faces as training data to obtain counterexamples (in transfer learning and on an already trained network) In the case of fine-tuning, the change is small).

Not only that, but this method makes it impossible for Apple to train a more complex network "offline", such as training the network in the lab, and then putting the trained, ready-to-use network on the phone. Therefore, I think FaceID uses a convolutional neural network similar to a twin network trained by Apple "offline", maps faces to a low-dimensional latent space, and then uses a contrast loss to maximize the distance between different faces. This way, you have an architecture that can be learned once, briefly mentioned in Apple's Keynote. I know that some judges may be unfamiliar with many terms; don't worry, keep reading and I will explain step by step.

Neural Networks: From Faces to Numbers

A Siamese neural network consists of two identical neural networks with exactly the same weights. This architecture can compute distances between specific data types, such as images. You pass the data through a twin network (or pass the data through the same network in two steps), the network will map it to a low-dimensional feature space, such as an n-dimensional array, and you train the network so that the mapping can make different The farther apart the data of the class is, the better, and the closer the data of the same class is, the better. In the long run, the neural network learns to extract the most meaningful features, compress them into an array, and build a meaningful map. For a more intuitive understanding, imagine that you need to use a small vector to describe the breed of dogs, so that similar dogs will have more similar vectors. You might use one value for the dog's coat color, one value for the dog's size, another value for the coat length, and so on. This way, similar dogs will have similar vectors. Very clever, isn't it? A Siamese neural network can do this for you, like an autoencoder.

In the picture above, from a paper co-authored by Yann LeCun, a neural network can learn the similarities between numbers and automatically classify them in 2 dimensions. Such techniques can also be applied to face data.

Using this technique, we can use a large number of faces to train an architecture that can recognize similar faces. Given enough budget and computing power (as Apple does), neural networks can become more robust and can handle more difficult samples like twins, adversarial attacks, etc. What is the last advantage of this method? That is, you have a plug-and-play model, which does not require further training, as long as you take a few photos at the beginning of entry, calculate the position of the user's face in the hidden map, and you can identify different users. (Imagine building a vector representing its breed for a new dog, as I just said, and storing it somewhere) In addition, FaceID can adapt to sudden changes in glasses, hats, makeup, etc. on the user's side and face Slow changes like hair - just add the reference face vector to the map and calculate based on the new appearance.

Implementing FaceID with Keras

As with any machine learning project, the first thing you need is data. Building our own dataset requires time and collaboration with many people, which is more challenging. So I found a decent RGB-D face dataset online . This dataset consists of a series of RGB-D pictures of people facing different directions and making different expressions, just like the iPhone X use case.

If you want to see the final implementation, you can go to my Github repository, which has a Jupyter Notebook .

In addition, I have also experimented with Colab Notebook , you can try it out.

I built a convolutional neural network based on SqueezeNet, the input is an RGB-D face image, a 4-channel image, and the output is the distance between two nests. The neural network is trained on a contrastive loss function to reduce the distance between images of the same person and increase the distance between images of different people.

The figure shows the loss comparison function.

After some training, the neural network was able to map faces into 128-dimensional arrays, so that images of the same person would be grouped together and away from other people's images. This means that when unlocking, the neural network only needs to calculate the distance between the photo taken when unlocking and the photo stored when logging in. If this distance falls below a certain threshold (the smaller the threshold, the safer), the device is unlocked.

I visualize a 128-dimensional nested space in 2-dimensional space using T-distributed random nearest neighbor embeddings. Each color represents a person: as you can see, the neural network learned to group the photos closely together . (In the case of random nearest neighbor embeddings using T-distribution, the distances between clusters are meaningless.) Interesting images also appear when using the PCA dimensionality reduction algorithm.

Pictured is a cluster of face photos in a nested space created with t-SNE.

Pictured is a cluster of face photos in a nested space created with PCA.

experiment! start up!

We can now simulate a FaceID loop to see how the model works: first, the user's face is entered; then it is unlocked, the user's face should successfully unlock the device but the other person's face will not. As mentioned earlier, the difference lies in whether the distance between the face of the face trying to unlock the device calculated by the neural network and the face at the time of entry is less than a certain threshold.

Let's start with the entry: I pulled a series of photos of the same person from a database and simulated the process of setting up facial recognition. The device calculates the nesting of each pose and stores it locally.

The picture shows the simulated FaceID input face data.

The picture shows the face data entry process seen by the depth camera.

When the same user tries to unlock the device, the distance between different gestures and expressions of the same user is low, about 0.3 on average.

The picture shows the distance between the faces of the same person in the nested space

The distance between RGB-D images of different people is 1.1 on average.

The picture shows the distance between the faces of different people in the nested space

Therefore, setting the threshold around 0.4 will prevent strangers from unlocking your device.

Epilogue

In this blog post, I implemented a proof-of-concept of FaceID face unlocking with Keras and Python based on face nesting and Siamese convolutional neural networks. I hope this article is helpful and inspiring to you, and you can also find the relevant Python code here .

For the Keras tool used in this article and what deep neural network modules it provides, you can check out the concise tutorial on my website .

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326097170&siteId=291194637