Stable Diffusion XL 0.9

Although the CEO has been involved in various controversies before, it still does not affect Stability AI's boarding in Time Magazine. Recently, the company released the XL 0.9 version of Stable Diffusion, with 3.5 billion + 6.6 billion double models, equipped with the largest OpenCLIP, which has made a new leap in the quality of AI image generation.

Stable Diffusion has been upgraded again!

Recently, Stability AI released the latest version of Stable Diffusion XL 0.9 (SDXL 0.9).

Compared with the previous model, this wave of updates has a qualitative leap in image and composition details.

Especially in terms of parameters, this SDXL0.9 has an integrated pipeline of 3.5 billion parameter basic models and 6.6 billion parameter models. In contrast, the beta version used only a single 3.1 billion parameter model.

To generate more realistic images, with greater depth and higher resolution (1024x1024), SDXL 0.9 uses two CLIP models, including the largest OpenCLIP model to date (OpenCLIP ViT-G/14).

Not only that, but SDXL 0.9 runs on consumer graphics cards. All you need is Win10/11 or Linux operating system, 16GB of memory, and an NVIDIA RTX 20 series graphics card with more than 8GB of video memory.

How different is SDXL 0.9 from SDXL Beta?

Let's look at the picture and talk.

Measured effect

Let's take a look at the new version of SDXL 0.9, what is the difference in the details of the picture~

Prompt: Aliens roam Las Vegas

SDXL Beta

SDXL 0.9

Prompt: A wolf in Yosemite National Park

Negative prompt: 3d rendering, glossy, plastic, blurry, grainy, low res, anime, oversaturated

SDXL Beta

 

SDXL 0.9

Prompt: coffee in hand

Negative prompt: 3d rendering, glossy, plastic, blurry, grainy, low res, anime

SDXL Beta

SDXL 0.9

The official said that the SDXL series will also provide a series of functions beyond the basic text prompt.

These include image-to-image prompts (feed in an image to get changes to that image), inpainting (reconstruct missing parts of an image), and outpainting (build a seamless extension of an existing image).

SDXL 0.9 runs on two CLIP models, including one of the largest OpenCLIP models ever trained (OpenCLIP ViT-G/14), which enhances 0.9's processing power and creates Realistic image capabilities.

The SDXL team will soon be publishing a research blog detailing the specifications and testing of this model in more detail.

Honored as Time's Most Influential Company

Just recently, Stability AI was selected by Time magazine as one of the 100 most influential companies.

For Stability AI, Time Magazine introduced it like this——

If you can describe it in words, Stability AI can turn it into a picture.

Stable Diffusion, the free and open-source text-to-image generator the company helped train, changed the world's understanding of AI's potential when it launched in August.

However, Stability AI quickly became embroiled in controversies over how the tools were trained and copyright lawsuits over data obtained from the Internet.

Still, the company says that within a month of launching Stable Diffusion 2.0, four of the App Store's top 10 apps were behind the model.

The company's CEO Emad Mostaque has also been reported to often exaggerate the company's performance. He had previously claimed that Stability AI's "true open source" paved the way for a "breakthrough."

SDXL Beta

In fact, the Beta version of SDXL has not been released for a long time. It can be seen that the version iteration of the drawing area is really changing with each passing day.

At that time, StabilityAI stated that Stable Diffusion XL is not the name of the final release, and it is not v3, because the SD-XL architecture is very similar to the model architecture of the SD-v2 series.

The following are a few examples of SD-XL officially released, it can be seen that the quality of the image is already very good.

 

The improvements of SD-XL compared to the previous version are as follows:

  • Generate high-quality images with short, descriptive prompts

  • A more prompt-fitting image can be generated

  • The human body structure in the image is more reasonable

  • Compared to v2.1 and v1.5 (to a lesser extent), SD-XL produces images that are more in line with the public aesthetic

  • Negative prompts are optional

  • Generated portraits are more realistic

  • Text in images is clearer

legible text

In the v1 series and v2.1 versions of the Stable Diffusion model, there is no ability to generate readable text in pictures.

While the text information generated by SD-XL isn't always accurate, it does get a huge boost.

A young woman holds a sign that says "Stable Diffusion", has highlighted hair, sits outside a restaurant, brown eyes, wears a skirt, side lights whaosoft  aiot  http://143ai.com

better body structure

Stable Diffusion has always had many problems in generating human anatomy. It is too common to have more legs and fewer arms.

For example, SD-v1.5 generates yoga images, often with distorted human bodies.

Although the images generated by SD-XL are not perfect, there have been significant improvements in human posture.

more aesthetic

For example, with the same theme of the house, SD-XL can generate more symmetrical photos with better visual effects.

The SD-XL also has a notable improvement in portrait photos.

 

References:

https://stability.ai/blog/sdxl-09-stable-diffusion

Guess you like

Origin blog.csdn.net/qq_29788741/article/details/131354953