Stable Diffusion Artificial Intelligence Image Synthesis

AI image generation has a lot to offer. A newly released open-source image synthesis model called Stable Diffusion allows anyone with a PC and a decent GPU to imagine almost any visual reality they can imagine. It can mimic almost any visual style, and if you feed it a descriptive phrase, the results will appear on your screen like magic.

Some artists are delighted by the prospect , others are not , and society at large seems largely unaware of the fast-moving technological revolution taking place via communities on Twitter, Discord, and Github. Arguably, image compositing has had as much impact as the invention of the camera—or possibly the creation of visual art itself. Even our sense of history could be threatened , depending on how events unfold. Either way, Stable Diffusion is leading a new wave of deep learning creative tools that promise to revolutionize the creation of visual media.

The rise of deep learning image synthesis

Stable Diffusion is the brainchild of former London-based hedge fund manager Emad Mostaque, who aims to bring novel applications of deep learning to the masses through his company Stability AI. But the roots of modern image synthesis go back to 2014 , and Stable Diffusion isn't the first Image Synthesis Model (ISM) to make waves this year.

In April 2022, OpenAI released DALL-E 2 , which shocked social media for its ability to translate scenes written in words (called "hints") into myriad visual styles that can be dreamy, Realistic, even mundane. People with access to closed tools have produced astronauts on horseback, teddy bears who bought bread in ancient Egypt, novel sculptures in the style of famous artists, and more.

Shortly after DALL-E 2, Google and Meta announced their own text-to-image AI models. Available as a Discord server since March 2022 and open to the public a few months later, MidJourney charges for access and achieves a similar effect, but with a more painterly and illustrative quality by default .

Then there is steady diffusion. On August 22, Stability AI released its open-source image generation model, whose quality can be said to be comparable to DALL-E 2. It also launched its own commercial website, called DreamStudio , which sells compute time for generating images using Stable Diffusion. Unlike DALL-E 2, anyone can use it, and since the Stable Diffusion code is open source, projects can build it with almost no restrictions.

In the past week alone, dozens of projects have popped up that are pushing Stable Diffusion in a whole new direction. People have achieved incredible results with a technique called "img2img," which "upgrades " MS-DOS game art, transforms Minecraft graphics into photorealistic graphics, transforms Aladdin scenes into 3D , transforms childlike Convert doodles to rich illustrations and more. Image compositing could bring rich idea visualization capabilities to the masses, lowering the barrier to entry while also accelerating the capabilities of artists who embrace the technology, as Adobe Photoshop did in the 1990s.

How Stable Diffusion Works

Broadly speaking, most of the recent ISM wave has used a technique called latent diffusion . Basically, the model learns to recognize familiar shapes in a field of pure noise, and then gradually brings those elements into focus if they match words in the cue.

First, the individual or small group training the model collects images with metadata, such as alt tags and captions on the web, and forms a large dataset. In the case of Stable Diffusion, Stability AI used a subset of the LAION-5B image set, which is basically a giant image grab of the 5 billion publicly accessible images on the internet. A recent analysis of the dataset revealed that many of the images came from sites like Pinterest, DeviantArt and even Getty Images. As a result, a steady diffusion absorbed the styles of many living artists, some of whom vehemently opposed the practice. More on that below.