Damedane at Station B: The essence is to change your face, you can learn it in five minutes

Introduction: AI face-changing technologies are emerging one after another, but one generation is stronger than one generation. Recently, an AI face-changing model first order motion model published in NeurIPs 2019 became popular, and its expression migration effect is better than other methods in the same field. Recently, this technology has caused a new trend in station B...

Source | HyperAI Super Neural (ID: HyperAI)

Recently, there has been a wave of videos with overly "grass" (station B idiom, meaning magical funny) on station B, which is very popular with millions of views.

Up masters with perfect skills have used the AI ​​face-changing project of "first order motion model" to generate a variety of videos with refreshing style.

For example, let Jacky Cheung, Du Fu, Tang Seng, and panda head emoticons sing "damedane" and "unravel" with eloquence and affection... The picture looks like this:

Tang Seng version of "Unravel", source: Station B Up Main: Rough Woolen Jun

Du Fu version of "Unravel" , source B station Up Main: cold_joke

It’s not fun to watch animated pictures, so let’s go directly to the video:

The tearing cat version of the brainwashing divine song "damedane" has been played 2.113 million so far. Source: Hutu Tutu of the main thick hair at station B Up

I have to say that it’s a bit above...you can go to Xiaopo Station to search for more works to watch.

These videos have attracted countless netizens eager to try, and leave messages asking for tutorials. Next, let's take a look at the technology (source of all evil) that achieves these face-changing effects: first order motion model.

Learning Zone B station, multiple tutorials teach you to lip-synch

So far, similar face-changing and lip-syncing technologies have emerged one after another, and every time one is proposed, it will cause a wave of face-changing craze.

The First order motion model is very popular due to its better effect on facial features and lip shape optimization, easy use and high efficiency.

The up owners on station B have actively uploaded many tutorials

For example, to change the face of "damedane" at the beginning of the article, it only takes tens of seconds to achieve, and it can be learned in five minutes.

Most up masters on station B choose Google Drive and Colab for tutorials. Taking into account the threshold of overcoming the wall, we selected one of the up masters' tutorials, using the domestic machine learning computing power container service ( https://openbayes.com ), and now we can still gather wool, and give away vGPU usage time every week To easily complete the tutorial.

You can complete your own "damedane" in less than 5 minutes

This instructional video explains step by step, even Xiaobai can easily learn this trick to change face. The up master also uploaded the notebook to the platform, and it can be used directly by just one-click clone.

However, many technology Up owners said that apart from entertainment, videos are made for technical exchanges, so I hope everyone will not abuse them maliciously.

The above video tutorial address:

https://openbayes.com/console/openbayes/public/containers/BwZQj5wr3Jp

Original project Github address:

https://github.com/AliaksandrSiarohin/first-order-model

 

Another face-changing artifact, where is it easy to use?

The first order motion model comes from a paper on NeurlPS 2019, "First Order Motion Model for Image Animation" ("First Order Motion Model for Image Animation"). The author comes from the University of Trento and Snap Company in Italy.

Paper address: https://arxiv.org/pdf/2003.00196.pdf

From the title, we can see that the goal of the paper is to make static pictures move. Given a source picture and a driving video, let the image in the source picture move along with the action in the driving video. That is, let everything move.

The effect is shown in the following figure, the upper left corner is the drive video, and the rest are the source static images:

 Model frame composition 

In general, the framework of this first-order motion model is mainly composed of two modules: a motion estimation module and an image generation module.

Motion estimation module: Separate the appearance and motion information of the target object through self-supervised learning, and perform feature representation.

Image generation module: The model will model the occlusion that occurs during the movement of the target, and then extract the appearance information from the given celebrity picture, and combine the previously obtained feature representation for video synthesis.

Method overview

What is better than traditional models?

Some people may have questions. How is this different from the previous AI face-changing method? The author gives an explanation.

The previous face-changing video operation requires the following operations:

  • Usually, it is necessary to carry out pre-training for the face image data of both sides of the face change;

  • It is necessary to mark the key points of the source image and then carry out the corresponding model training.

However, in reality, individuals have less face data and do not have a lot of time for training. Therefore, the traditional model usually works well for a specific image, but when used on the general public, the quality is difficult to guarantee and it is easy to overturn the car.

The previous method will cause inaccurate emoji migration

Therefore, the method proposed in this paper solves the problem of dependence on data and greatly improves the generation efficiency. If you want to realize the migration of expressions and actions, you only need to train on the same type of image data set.

For example, if you want to realize expression migration, you only need to train on the face data set regardless of whose face you are changing; if you want to realize the migration of Tai Chi movements, use the Tai Chi video data set for training.

After the training is completed, using the corresponding pre-training model, you can achieve the result of making the source image follow the driving video.

This method is compared with other methods in the same data set training effect

The methods in the second and third columns will be biased for human movement migration

The author compares his method with the most advanced methods in this direction, X2Face and Monkey-Net, and the results are in the same data set. All indicators of this method have been improved. On two face data sets (VoxCeleb and Nemo), This method is also significantly better than X2Face, which was originally proposed for face generation.

The full interpretation of the damedane face-changing video prototype

更多精彩推荐

Guess you like

Origin blog.csdn.net/dQCFKyQDXYm3F8rB0/article/details/108570610