Faster than Diffusion! OpenAI's new image generation model is open source! Song Yangyi, a Tsinghua alumnus

Click the card below to follow the " CVer " official account

AI/CV heavy dry goods, delivered in the first time

Click to enter —>【Diffusion Model】WeChat Technology Exchange Group

Jin Lei Yu Yang Xiao Xiao sent from Aufei Temple
and reprinted from: QbitAI (QbitAI)

The field of image generation seems to be changing again.

Just now, OpenAI open sourced a faster consistency model than the diffusion model :

Generate high-quality images without adversarial training!

Once this heavy news was released, it immediately detonated the academic circle.

52ada5356ec1732f47f78adb23d34a06.png

Although the paper itself was released in a low-key way in March, it was widely believed at the time that it was just a cutting-edge research of OpenAI, and the details would not be really made public.

bf4bd9309c84bb6d9c5bdc841f1b1661.png

Unexpectedly, an open source came directly this time . Some netizens immediately started to measure a wave of effects, and found that it only takes about 3.5 seconds to generate about 64 images of 256×256:

game over!

37f318b419c21722b6864193f3868508.png

This is the image effect generated by this netizen, which looks pretty good:

5d5963c291c8943a25a612e0c773977e.png

Some netizens joked: OpenAI is finally Open this time!

7c8fbc2e7a44369aba45f27ff7a8b4ec.png

It is worth mentioning that Song Yang , an OpenAI scientist who wrote the first paper , is a Tsinghua alumnus. He entered Tsinghua's mathematics and physics basic science class through the leadership program at the age of 16.

Let's take a look at what kind of research OpenAI has open sourced this time.

What kind of heavy research has been open sourced?

As an image generation AI, the biggest feature of the Consistency Model is that it is fast and good.

Compared with the diffusion model, it has two main advantages:

First, high-quality image samples can be directly generated without adversarial training.

Second, compared to the diffusion model that may require hundreds or even thousands of iterations, the consistency model can handle a variety of image tasks in just one or two steps——

Including coloring, denoising, super-scoring, etc., can be done in a few steps without explicit training for these tasks. (Of course, if few-sample learning is performed, the generation effect will be better)

5d7ad1ce77683ade5787dc14c0cdbf88.png

So how exactly does the consistency model achieve this effect?

From a principle point of view, the birth of the consistency model is related to the ODE (ordinary differential equation) generation diffusion model.

As can be seen in the figure, ODE will first convert the image data into noise step by step, and then perform a reverse solution to learn to generate images from the noise.

And just in this process, the authors try to map any point on the ODE trajectory (such as Xt, Xt and Xr) to its origin (such as X0) for generative modeling.

Subsequently, models of this mapping are named consensus models, since their outputs are all at the same point on the same trajectory:

14fb5521782cbfd8a86b9d52203f8cd3.png

Based on this idea, the consistency model does not need to go through long iterations to generate a relatively high-quality image, but can be generated in one step .

The figure below is a comparison between the consistency model (CD) and the diffusion model (PD) on the image generation index FID.

Among them, PD is the abbreviation of progressive distillation, a latest diffusion model method proposed by Stanford and Google Brain last year, and CD (consistency distillation) is a consistent distillation method.

It can be seen that the image generation effect of the consistency model is better than that of the diffusion model on almost all datasets, with the only exception being the 256×256 room dataset:

7b7a28a72096a0336b1e36831ee6c80d.png

In addition, the authors also compared models such as diffusion model, consistency model and GAN on various other data sets:

4c741589fde8ee2c7508b92fea3653a2.png

However, some netizens also mentioned that the image generated by the open source AI consistency model is still too small:

Sadly, the image generated by the open source version is still too small. It would be very exciting if an open source version that generates larger images can be given.

68a25460bfb8caa22fe4df608ca0d23e.png

Some netizens also speculated that it may be that OpenAI has not been trained yet. But we may not be able to get the code (manual dog head) after training.

But for the significance of this work, TechCrunch said:

If you have a bunch of GPUs, then use the diffusion model to iterate more than 1,500 times in a minute or two, and the effect of generating pictures is of course excellent.

But if you want to generate pictures in real time on a mobile phone or during a chat conversation, then obviously the diffusion model is not the best choice.

The consistency model is the next important move of OpenAI.

Looking forward to OpenAI will open source a wave of higher resolution image generation AI~

Song Yangyi, a Tsinghua alumnus

Song Yang, the first author of the thesis, is currently a research scientist at OpenAI.

25e31a6a55069abd365f03580d75a957.png

When he was 14 years old, he was selected into the "Tsinghua University's New Centennial Leaders Program" with the unanimous vote of 17 judges. In the next year's college entrance examination, he became the champion of science in Lianyungang City, and successfully entered Tsinghua University.

In 2016, Song Yang graduated from Tsinghua University's mathematics and physics basic science class, and then went to Stanford for further study. In 2022, Song Yang will receive a Ph.D. in computer science from Stanford, and then join OpenAI.

During his doctoral period, his thesis "Score-Based Generative Modeling through Stochastic Differential Equations" also won the ICLR 2021 Outstanding Paper Award.

5d16d75c0000a2a370c45326ee6d1f7a.png

According to the information on his personal homepage, starting from January 2024, Song Yang will officially join the Department of Electronics and Department of Computational Mathematical Sciences of Caltech as an assistant professor.

Project address:
https://github.com/openai/consistency_models

Paper address:
https://arxiv.org/abs/2303.01469

Reference link:
[1] https://twitter.com/alfredplpl/status/1646217811898011648
[2] https://twitter.com/_akhaliq/status/1646168119658831874

Click to enter —>【Computer Vision】WeChat Technology Exchange Group

The latest CVPP 2023 papers and code download

 
  

Background reply: CVPR2023, you can download the collection of CVPR 2023 papers and code open source papers

Background reply: Transformer review, you can download the latest 3 Transformer review PDFs

扩散模型和Transformer交流群成立
扫描下方二维码,或者添加微信:CVer333,即可添加CVer小助手微信,便可申请加入CVer-扩散模型或者Transformer 微信交流群。另外其他垂直方向已涵盖:目标检测、图像分割、目标跟踪、人脸检测&识别、OCR、姿态估计、超分辨率、SLAM、医疗影像、Re-ID、GAN、NAS、深度估计、自动驾驶、强化学习、车道线检测、模型剪枝&压缩、去噪、去雾、去雨、风格迁移、遥感图像、行为识别、视频理解、图像融合、图像检索、论文投稿&交流、PyTorch、TensorFlow和Transformer等。
一定要备注:研究方向+地点+学校/公司+昵称(如扩散模型或者Transformer+上海+上交+卡卡),根据格式备注,可更快被通过且邀请进群

▲扫码或加微信号: CVer333,进交流群
CVer计算机视觉(知识星球)来了!想要了解最新最快最好的CV/DL/AI论文速递、优质实战项目、AI行业前沿、从入门到精通学习教程等资料,欢迎扫描下方二维码,加入CVer计算机视觉,已汇集数千人!

▲扫码进星球
▲点击上方卡片,关注CVer公众号
整理不易,请点赞和在看

Guess you like

Origin blog.csdn.net/amusi1994/article/details/130164800