Deep Learning Introductory Tutorial (2): Use the pre-trained model to generate images from text TextToImageGenerationWithNetwork

This deep learning introductory tutorial is carried out under the inspiration and resource support of polyu HPCStudio. I would also like to thank polyu and the teachers who provided support.

Contents of this article: Use the pre-trained model on the Google Colab platform to generate text images Text To Image Generation With Network

(1) What you will learn:

Learn what image generation from text is and how to use it. Using Pretrained Models to Create Your Art and How to Make It Even Better

(2) outline outline:

1: What is text-to-image generation?
2: What is stable diffusion?
3: Quick engineering prompt project
4: Example code for generating images using pre-trained models

1: What is text-to-image generation?

A text-to-image model is a machine learning model that takes a natural language description as input and generates an image that matches that description. Such models began to be developed in the mid-2010s due to advances in deep neural networks. In 2022, the output of state-of-the-art text-to-image models such as OpenAI's DALL-E 2, Google Brain's Imagen, Midjourney, and StabilityAI's Stable Diffusion begins to approach the quality of real photographs and human-drawn art.

Text-to-image models typically combine language models, where the language model converts input text into a latent representation, and generative image models, which generate images conditioned on that representation. The most effective models are usually trained on large amounts of image and text data scraped from the web.
insert image description here

2: What is stable diffusion stable diffusion

Stable Diffusion is a deep learning text-to-image model released in 2022. It is primarily used to generate detailed images conditioned on textual descriptions, but it can also be applied to other tasks such as inpainting, outpainting, and generative, image-to-image translation guided by text cues.

  • Super Resolution - denoising the input image
  • Latent Diffusion Models - Denoising Again and Again
  • Simulate a new image

Stable diffusion's Structural
A text encoder text encoder that converts your cues into latent vectors.

A diffusion model that repeatedly "denoises" 64x64 latent image patches.

A decoder converts the final 64x64 latent patch into a higher resolution 512x512 image.

3: Quick engineering prompt project?

Prompt project is a skill that uses specific words to create a good work of art or instructs an artificial intelligence (robot) to provide a desired output (in general).

  1. Core Prompt - The main theme. For example, the protagonist boy, girl, old man, animal, etc., more descriptions and adjectives would be better.

  2. Style - such as pencil drawing, oil painting, photograph, etc.

  3. . Artists - such as Vincent Van Gogh, Leonardo DaVinci, Greg Rutkowski, etc.

  4. Finishing touches - such as trends on artstation, unreal engine, etc. https://beta.dreamstudio.ai/prompt-guide

  5. You can also ask the AI ​​to help refine your hints.

Below is the model Microsoft built for optimization hints

https://huggingface.co/spaces/microsoft/Promptist

4: Example code for generating images using pre-trained models

4.1
Search Google Colab on Baidu, log in to your account, register a notebook, and then click Connect to GPU
insert image description here
insert image description here
4.2
Import the required libraries in turn

!pip install diffusers

!pip install setuptools-rust

!pip install transformers

from diffusers import StableDiffusionPipeline
#setup pipeline to pretrained model  下载预训练模型
pipe = StableDiffusionPipeline.from_pretrained("IDEA-CCNL/Taiyi-Stable-Diffusion-1B-Chinese-v0.1").to("cuda")
# 测试
prompt = '远上寒山石径斜,白云深处有人家。'
image = pipe(prompt, guidance_scale=7.5).images[0]  
image.save("白云深处.png")

insert image description here

prompt = '罨畫清溪上, 蓑笠扁舟一隻, 油畫'
image = pipe(prompt, guidance_scale=7.5).images[0]  
image.save("罨畫清溪上.png")

insert image description here

Guess you like

Origin blog.csdn.net/qq_40514113/article/details/131968084