Joining forces with ChatGPT, OpenAI creates the most powerful AI painting tool DALL·E 3!

b449eb0e4e4739832297469e74ce1ad5.gif

Organizing | Tu Min

Listing | CSDN (ID: CSDNnews)

OpenAI, which is at the forefront of AI, has refreshed itself again today. It recently released a preview version of the third version of its image generation tool DALL-E. Not only that, DALL-E 3 integrates with ChatGPT, allowing users to create prompts using ChatGPT and includes more security options.

d1b34af59099fbfdd30e59faffcbe210.png

For this reason, OpenAI CEO Sam Altman also personally stated for this product, "DALL-E 3 is quite amazing in my opinion."

96f1f761ebc6585232df7233c67f3b29.jpeg

Overnight, some designers were happy and some were sad.

a63e7069efdcee1cbf6109a505ef77b5.png

Subverting AI painting, DALL-E 3 is here!

Like its predecessor, DALLE-3 is a text-generated image tool that lets the system create novel images through natural language prompts.

“DALL-E 3 understands significantly more nuances and details than our previous systems, allowing you to easily transform ideas into very precise images,” OpenAI wrote on its official website when introducing DALL-E 3.

In the official example, OpenAI shared the different effects of using the same prompt word, DALL-E 3 and DALL-E 2:

1edfd11eb6bf2d99285b96f02339758c.png

After reading it, many netizens ridiculed that DALL-E 2 is abstract, while DALL-E 3 is impressionistic. It is more based on reality and can improve the details more effectively, making the generated pictures more real and attractive. .

If this is the first improvement of DALL-E 3, then the second one is that the latest DALL-E 3 is far more capable in image synthesis models than any other existing models and can better understand Context, the premise is that there is no need for users to specifically learn the Prompt project.

In the officially released examples, users only need to imagine and give some simple words, such as full moon, pedestrians enjoying the nightlife, young woman, red hair, grumpy old man bargaining, tall and experienced person...

You can get a picture like the following through DALL-E 3, without any prompt word threshold restrictions:

9d09beb54d8f5ec95a292ed7e65e906c.png

So how is it achieved?

As mentioned at the beginning of the article, DALL-E 3 is "natively built" on ChatGPT and will be launched as an integrated feature of ChatGPT Plus, allowing conversational improvements to images with an AI assistant as a brainstorming partner.

It also means ChatGPT will be able to generate images based on the context of the current conversation, which could lead to novel features.

For example, when opening the ChatGPT dialog window,

Question: "My five-year-old son keeps talking about the 'Super Sunflower Hedgehog'. What does it look like?"

ChatGPT gives you answers as you speak:

dc0961fc79b49da0031b13765ec9213c.png

Question: "My daughter says it's called Larry. Can you show me something more like it?"

ChatGPT:

41e184afadd6e450cc2e3c2e14268138.png

Q: She will love these! Can you show me Larry's house?

ChatGPT display:

8c757408163e6993803b93d705dd2d60.png

Q: Can you tell me Larry is "friendly"?

ChatGPT:

2e822bb7cf059eea20f21412e5d70d1d.png

b05373a7deee0e8f1e3924f3c364fc43.png

In this regard, although the competing product Midjourney of DALL-E 3 can render realistic details very well, it still needs to continuously modify and optimize the prompt words to get the image you want.

For OpenAI users, ChatGPT helps designers refine their ideas and clarify design ideas, while DALL-3 helps designers free their hands. The combination of the two will also bring huge potential.

1f2e1e721c8f241ad1abd7b69c39ab77.png

OpenAI that blocks various potential risks

However, it should also be noted that since DALL-E came out in January 2021 and OpenAI launched DALL-E 2 in April 2022, the latter mainly uses latent diffusion model (ldm) technology, which requires data analysis. It uses a set of training and prompt information, and combines the perception ability of GAN (Generative Adversarial Network), the detail preservation ability of the diffusion model, and the semantic ability of Transformer to create a better portrait.

There are also developers using this technology in the industry, such as Stable Diffusion.

However, this method allows DALL-E to learn image concepts by grabbing a large number of human-made artwork data sets during training, which naturally brings about a series of disputes about copyright and ethics. Even last year, many artists Many platforms have begun to protest against AI-generated artworks, criticizing these AI artworks as immorally copying their creative styles and so on.

In response to these disputes, OpenAI stated on its official blog that the design of DALL-E 3 rejected requests to use images in the style of living artists. OpenAI also provides a form (https://share.hsforms.com/1_OuT5tfFSpic89PqN6r1CQ4sk30) where creators can opt out of having their images used to train future models.

In addition, not long ago, OpenAI issued an announcement announcing the launch of a global recruitment of "red team" network members, aiming to introduce external forces to unearth the flaws and risks of AI systems in advance.

In terms of DALL-E 3 development, OpenAI stated that it has worked with members of the “red team” to set the DALL-E 3 system to reject requests to generate images with the names of public figures, and to implement keyword and image detection filters. , restricting users' ability to create violent, sexual or hateful content to identify and reduce potential risks, increasing the level of security in risk areas.

Separately, OpenAI also revealed that it is experimenting with a "provenance classifier" tool to help identify whether an image was generated by DALL-E 3. However, many users who have used AI detection tools believe that it is almost impossible to truly detect AI images.

a6a85dd25a445cc5fb0cf0d0cbb80266.png

DALL-E 3 vs Midjourney

In fact, due to the current lack of regulatory policies, laws, regulations, ethics and other standards regarding AIGC tools and content, the implementation of various tools naturally has both advantages and disadvantages.

However, from a technical perspective, can DALL-E 3 represent another leap forward in AIGC tools? A user named MattGarcia.eth used the same prompt word as DALL-E 3 on the OpenAI official website and used Midjourney to generate a version. We might as well intuitively feel the competition between the two:

"An illustration of Avocado sitting in a therapist's chair saying 'I feel so empty inside' with a pit-sized hole in the middle. The therapist is a spoon and is scribbling away."

890067dac396f2e8665e86e2f384b4a6.jpeg

b21bf7783b46f7dd710eacf8861c60ca.jpeg

"The illustration depicts a human heart made of translucent glass, standing on a pedestal amid stormy waves. Wisps of sunlight pierce the clouds, illuminating the heart and revealing the tiny universe within. Inscribed on the horizon There is a line of eye-catching characters 'Find the universe in your heart'."

b6868c2bb6d29ebca9ec01d66832e217.png

"The cozy living room features a vibrant yellow banana-shaped sofa, its curves supporting a stack of colorful cushions. A patterned rug on the wooden floor adds a touch of eclectic charm, and a potted plant sits on In the corner, reaching out the window. The sun shines through the window."

497c96bb85791e636d4f967644cadcf8.png

"A detailed oil painting of an old captain piloting his ship through a storm. Salt water splashes on his weathered face, his eyes filled with determination. Swirling clouds are seen overhead, and the threat of crashing waves About to be submerged..."

d39d5a83c650a363bc2bc7d8ce591bf0.jpeg

3f53876c3f5dd7235fcc759d3e0d604d.jpeg

"An ink sketch style illustration of a little hedgehog holding a piece of watermelon with its little paws and happily taking a bite with its eyes closed."

54039978a3172bb33f4ddc01fca77711.jpeg

5a5ab8ebc67edbd379f27eab2d99c3cb.jpeg

"An old botanical illustration, drawn with fine lines and a hint of watercolor whimsy, depicts a strange lily crossed with a Venus flytrap, its petals ready to catch any unsuspecting insect."

a5fb58fbcc67a21cc4552e5966631a27.png

"A vast landscape composed entirely of various types of meat opens before the viewer. Mountains of tender and juicy roast beef, drumstick trees, rivers of bacon and boulders of prosciutto create a surreal and mouth-watering scene. The sky adorns Italy Pepperoni Sun and Salami Cloud.”

9e6686bc4da7a77b8e09fa2e5f6f2382.png

"Photograph of a lychee-inspired spherical chair with a bumpy white exterior and luxurious interior set against tropical wallpaper."

b85b66687efd8eaca2de1f4fa45ee6a5.png

"An expressive oil painting of a basketball player dunking, depicted as an explosion of nebula."

cca3d7c6e344d1845a7bd80d68d93d1a.jpeg

e90b144adb677ecbbd095d048e6d5bcc.jpeg

"A close-up shot of a hermit crab sitting in wet sand, with sea foam nearby, the detail of its shell and the texture of the sand enhanced."

1840364bcb10dd9a14f2e15bcf70e9ef.png

"A 2D animation of a folk band composed of anthropomorphic autumn leaves, each playing traditional bluegrass music, in a rustic forest setting lit by the soft light of a full moon."

a71dbed860f5e04caee6777d79d11916.png

Which of the two tools do you think is better?

Finally, DALL-E 3 is not currently open to the public. OpenAI stated that "DALL-E is currently in the preview stage and will be available to ChatGPT Plus and enterprise customers in early October."

For more details, please see the official announcement: https://openai.com/dall-e-3

8e10dc3d64a874a65c95ecd4c443de08.gif

5d60e6e56e91ae4654b2dff18cb87654.png

Guess you like

Origin blog.csdn.net/dQCFKyQDXYm3F8rB0/article/details/133152332