DALL·E 3 Microsoft Bing preview! The horse-riding astronaut problem is solved! Specify 50 objects in one painting

Click on the card below to follow the " CVer " public account

AI/CV heavy-duty information, delivered as soon as possible

Click to enter -> [Target Detection and Transformer] communication group

Mengchen was sent by Ao Fei Si
and reprinted from: Qubit AI (QbitAI)

The limited test of DALL·E 3 has been launched by Microsoft Bing first. See if you are one of the European Emperors?

a215851f1b9badc2ca9a3bad87d2170e.png From WindowsLatest

It doesn't matter if you haven't received the qualification. Coupled with third-party research previews and internal trials by OpenAI employees, various test cases have emerged one after another, which is guaranteed to be enjoyable.

The most exaggerated one was "50 different objects appearing in the specified picture" , but hundreds of them were drawn.

b40591badc72ce3f96a20a1cb0f7817f.png

In addition to simple tile arrangement, these objects can be combined more creatively.

6a7b2996c35bd756d6eb4e2312d0fc4a.png

For the counterfactual concept of an astronaut riding a horse , various models from OpenAI and Google in the past could only draw an astronaut riding a horse .

The paper was generally regarded as a failure case, and was ridiculed by Marcus, the AI ​​pessimist at the time.

d1419f78be28ee5c14f6c8ed9dc03b9c.png

Now, DALL·E 3 can easily handle it with the support of ChatGPT.

8824a1ccaa0402d38033316c7ba7413b.png

The great progress of DALL·E 3 this time is not only the result of OpenAI’s own efforts, but also the result of the joint cooperation with Microsoft .

Although it is not clearly stated, at least three Microsoft engineers and researchers are involved in the research part of the contribution list, and most of the members in the inference optimization part are from the Microsoft DeepSpeed ​​team.

5cbb68eef6f5b9035be15b2bad0343b9.png

Looking back at GPT-4, it was still mainly developed internally by OpenAI, and was then given open access to Microsoft and other research institutions for testing.

This change in the cooperation model also represents a further deepening of the relationship between the two companies .

Specify 50 objects in one painting

Confirmed by Microsoft Bing CEO Mikhail Parakhin , the lucky few percent of users have already qualified for the test.

eb88c63a860ac091eb2506a4b6b11d57.png

Since the number of places is really limited, netizens who can’t wait have put forward their ideas online and asked people with accounts to help test it.

The netizen who proposed to draw 50 different objects wanted to test how many objects DALL·E 3 could fit into one picture, similar to the "context window" of a large language model .

b9218d448eff171a02aa85cc9705789e.png

Nathan Shipley, a third-party designer who received a preview version of the study, took on the challenge.

First, he asked ChatGPT to randomly list 50 daily objects, and then directly asked to draw these objects in a painting. The complete prompt word is completed by ChatGPT itself , and then drawn by DALL·E 3.

66f54fc54a21e67dc8d8823cdb147e32.png

Next Nathan asked, "Can you create an image of a surfer putting these objects into a pile while surfing hard?"

ChatGPT added some details in the prompt word:

A middle-aged Polynesian male surfer with sun-kissed skin holds up 50 items as he surfs hard, including a tea cup, a bicycle, an umbrella... and an expression of determination and slight panic as the waves crash around him. , trying to balance these objects and his surfboard.

As a result, I got an image that basically met the requirements in one try, except that the character's expression looked more like extreme panic rather than mild panic.

58f3a283a2327dabaf3a287d4f4616dc.png

Later, Nathan asked for a lower perspective. This time, ChatGPT changed the character description in the prompt word to "an elderly Spanish woman."

0d3f2ad5d162080440a77b5e7b72c196.png

Onlookers thought it was very surprising, but also pointed out why the size of the pencil was disproportionate to other objects.

be7a634a1861be7fb67ef8a2a67faf79.png

Some people speculate that the model only knows what an object looks like, but not its relative size, especially since publicity pictures of pens often use close-up shots.

c6c3cf1868b0ba2efe65183f5a5e550e.png

In addition, Nathan also asked from ChatGPT which parameters of the screen the user can control.

ChatGPT responded as follows:

  • Description: You can provide a detailed description of the desired image. The more specific and detailed you are, the closer the resulting image will be to your wishes.

  • Style and Medium: You can specify the style or medium of the image, such as Painting, Photo, Illustration, Cartoon, Watercolor, 3D Rendering, etc.

  • Resolution: By default, images are generated at 1729x1024 resolution. But if your description suggests a full-length portrait or something more suitable for high aspect ratios, the image will be generated at 1024x1792. You can also request square resolution (1024x1024) if needed.

  • Number of images : If you don't specify, I usually create four different images based on your description. However, you can request a specific number (up to four).

  • Modifications : If you are not satisfied with the initial results, you can provide feedback and request modifications. While I can't directly modify existing images, I can use your feedback to generate new images that better fit your vision.

  • Random Seed : If you want to modify a previous image, I can use the seed from that image's metadata to generate a new version. This ensures a certain consistency between the original and modified images.

  • Please note that certain restrictions apply . For example, images of public figures and the styles of artists within the last 100 years cannot be created directly.

The focus is on the back:

However, I can offer alternatives or use adjectives to capture the essence of a particular style without quoting it directly.

be1a8b94f632ef02daf05fefe0f54dac.png

Horse Riding Astronaut Puzzle Solved

Among OpenAI employees, Will DePue is the most active in disclosing the results of the DALL·E 3 trial.

In the horse-riding astronaut test, he said that it was not 100% successful.

6e0cb6d73aca2d29a6855ca988067907.png

You can get it right in two or three attempts because GPT-4 works with you to refine the prompt word until you get it right.

With some effort, you can get almost anything you want.

487d242c6efb83492522ed5fa6ca22f9.png

Some netizens tried to use MidJourney to obtain the same results, but it can only be said that it is not completely impossible, but it requires a lot of effort.

Almost impossible, requires a lot of hint engineering, and is difficult to reproduce.

If you are an experienced MidJourney user, you might as well try to see if it works.

1f6f3acbddb23a715409c3871b820b9f.png

In the "8 giraffes drinking water" challenge proposed by netizens, DALL·E 3 once again showed its weakness in counting the numbers accurately .

1c82bd3635a4d567bff7d8c70e76f577.png Count how many giraffes there are in the picture

More erroneous attempts also resulted in a two-headed giraffe.

c0fa09d77b8db7bfc01b8eeafb04d8f9.png

Getting the AI ​​to count correctly doesn't solve this time, but at least it solves the problem of understanding spatial relationships.

f55c4dae196a7482640a3624a0e155b1.png

In the challenge "Four zebras running on the grassland, a lion chasing behind, and an eagle above, there are no other animals in the picture" proposed by netizens, the spatial relationship is basically correct, but there is an extra zebra.

710fcb9323896535d64c9733d7145a06.png

In comparison, both DALL·E 2 and Stable Diffusion have worse understanding of spatial relationships.

7a9901b172da81325c8aa25087c35589.png

Adam Goldberg , who is responsible for the enterprise version of ChatGPT at OpenAI , also posted many high-quality results, but did not share the prompt words.

68fd2393a04a72929838d6bb6488c467.png

19110876dc1ddbcc1905160fc486875c.png

Jerry Tworek, who is responsible for writing code and calling tools for AI, has created many abstract concept paintings, such as "Division of Mechanical Cells . "

83fc91da897c02b3ba1441c2f170376e.png

and "Computer Program Trees Across the Galaxy . "

c1791f5582e87660d4229e9876f8f3d1.png

Microsoft OpenAI collaborates

DALL·E 3 has made a huge improvement this time. In addition to integrating ChatGPT, how exactly is the image generation part done?

Unfortunately, given the trend of OpenAI getting closer and closer, it is likely that it will not publish papers like the previous two generations. We can only make a few guesses from the contribution list.

There are five authors in the DALL·E 2 paper.

9158e5a2e51564c9f8df8601ba00d329.png

As for DALL·E 3, leaving aside the product, security, public communication and legal teams, there are 18 people involved in the research part alone .

7702ab6a00316e446dbfcf566e981e32.png

Among them is Yang Song , a Tsinghua alumnus who proposed the Consistency Models .

The consistency model is faster than the current most popular diffusion model, and can generate 64 256*256 images in 3.5 seconds.

However, Song Yang's research contribution this time is minor. It is not sure whether DALL·E 3 used the consistency model. It is more likely that he borrowed his method in the improved diffusion model.

In addition, in addition to the author of DALL·E 2 and Ouyang Long of the ChatGPT team, at least three researchers are from Microsoft.

Dr. Jianfeng Wang graduated from the University of Science and Technology of China and works as a chief researcher at Microsoft.

Dr. Lijuan Wang graduated from Tsinghua University and works as a director research manager at Microsoft.

The two have participated in the research of NUWA-Ininity, an infinite canvas image generation .

8741359b2edaf9f33c663bae1dc99d13.png

Lindsey Li is an alumnus of Beijing Institute of Technology. She received two master's degrees from Purdue University and UC San Diego. She is a senior researcher at Microsoft and has published many top conference papers in the field of multimodality.

9050b848ec8e33085e5bc3f331dc62e6.png

In addition to research, the inference optimization of DALL·E 3 has deep involvement from the Microsoft DeepSpeed ​​team .

Deepspeed is an open source deep learning optimization library that reduces computing power consumption and memory usage, and trains and infers large-scale distributed models through better parallelism on existing hardware.

a7ae44cc4fabb5a5184054da2e16354f.png

Many of them expressed their pleasure to participate in this work and were excited about the release of DALL·E 3.

1c1a2722d72b5c794ff2261885da630d.png

Finally, among the special contributions, Microsoft's Bing CEO Mikhail Parakhin and Azure Cloud Chief Vice President Misha Bilenko are among them.

70be6c3a641809f4925f067f52838537.png

Microsoft also confirmed in its previous release activities that Bing will directly integrate DALL·E 3.

c2985e3ce4f44b0df6adeeb45ffcb66a.png

According to current rules, DALL·E 2 on Bing is free. 99 acceleration tokens will be issued. Without tokens, it will just take longer to queue.

7f8e607ba0735daf4a06cf5118cfd143.jpeg

Although DALL·E 3 will cost US$20 per month on ChatGPT Plus in October.

But since GPT-4 is provided for free on Bing, we can look forward to a wave of free DALL·E 3 games in the future~

Reference links:
[1]https://x.com/willdepue/status/1705005718666158107
[2]https://x.com/CitizenPlain/status/1705248617131291032
[3]https://x.com/MParakhin/status/ 1704563792645079143

Click to enter -> [Target Detection and Transformer] communication group

ICCV/CVPR 2023 paper and code download

 
  

Backstage reply: CVPR2023, you can download the collection of CVPR 2023 papers and code open source papers

后台回复:ICCV2023,即可下载ICCV 2023论文和代码开源的论文合集
目标检测和Transformer交流群成立
扫描下方二维码,或者添加微信:CVer333,即可添加CVer小助手微信,便可申请加入CVer-目标检测或者Transformer 微信交流群。另外其他垂直方向已涵盖:目标检测、图像分割、目标跟踪、人脸检测&识别、OCR、姿态估计、超分辨率、SLAM、医疗影像、Re-ID、GAN、NAS、深度估计、自动驾驶、强化学习、车道线检测、模型剪枝&压缩、去噪、去雾、去雨、风格迁移、遥感图像、行为识别、视频理解、图像融合、图像检索、论文投稿&交流、PyTorch、TensorFlow和Transformer、NeRF等。
一定要备注:研究方向+地点+学校/公司+昵称(如目标检测或者Transformer+上海+上交+卡卡),根据格式备注,可更快被通过且邀请进群

▲扫码或加微信号: CVer333,进交流群
CVer计算机视觉(知识星球)来了!想要了解最新最快最好的CV/DL/AI论文速递、优质实战项目、AI行业前沿、从入门到精通学习教程等资料,欢迎扫描下方二维码,加入CVer计算机视觉,已汇集数千人!

▲扫码进星球
▲点击上方卡片,关注CVer公众号
整理不易,请点赞和在看

Guess you like

Origin blog.csdn.net/amusi1994/article/details/133287006