The use of Stable Diffusion and various resources

Brief introduction of SD

Stable Diffusion is a deep learning text-to-image generation model released in 2022. It is mainly used to 文本的描述generate details 图像, although it can also be applied to other tasks, such as inner complement drawing, outer complement drawing, and translation of map-generated maps guided by prompt words.

Please add a picture description
The technical principle used in s is called diffusion algorithm, and the specific algorithm design is a very complicated subject. The general principle is to blur a picture, and then 关键词make the content bit by bit concrete. In the end we got what we described.

I have known about this technology since September 2022 last year. However, there is no detailed understanding of the practical operation plan of this technology.
This year, I also looked at the Alibaba Cloud operation of this technology and the content of some public accounts intermittently this year. There are also relatively popular hidden QR codes and light and shadow street maps with hidden text, which are also concerned. But it didn't actually work out.
This weekend, I found and re-read the materials I had seen before, and practiced it. Below is my summary of this technique.

sd install

If sd is operated locally, a good discrete graphics card is required. If you can't meet the requirements of this graphics card, you can choose Alibaba Cloud server or some cloud desktops to operate.

  • The local installation environment is divided into three steps, first install the python environment, and configure it into the environment variable. Secondly, you need to install the cuda support that you need to support first, and nvidia-smiyou can check the version supported by your device CUDA version.
  • You can download the corresponding version of cuda through the link of the cuda download address
  • Then you need to install it git, because you need to githubclone the project from above webui, and some plug-ins can also be passed git, clonethe webui project address on github

  • In addition to this fully manual method, you can also find some great god launcher installation packages from station B. The installation packages are all packaged with the operating environment, which is very worry-free to use.

Model download

After the project starts, it is necessary to download some necessary large models, and there are many models required by sd.

  • checkpointThis is the source of data needed for sd to run. Large models have large file sizes. Generally, it will be above 2 G. It is equivalent to a large dictionary needed for sd painting.
  • VAEThis is to color the screen, which can be understood as a filter. Make the picture more vivid and eye-catching.
  • EmbeddingSD creation screens need to be described by text, that is, keywords. This model is equivalent to directly packaging some keywords. When describing, there is no need to describe a lot of content, and the model is directly imported into the corresponding folder. Then fill in the corresponding Embedding in the keyword.
  • HypernetworkThis is equivalent to a bookmark, because the dictionary is very large, and this can make the pictures made by sd more in line with our requirements.
  • LoraLora is equivalent to an upgraded version of Hypernetwork. It is equivalent to a more detailed description on the basis of a large model, and a more detailed description of the requirements. Fine-tune results for large models.
  • LyCORISLyCORIS is an upgrade of Lora, which is more sophisticated in algorithm generation and more adjusted.

All the models are designed to provide a basis for our description content and to allow sd to have a range of painting, because the data set is very large when training a large model, and it needs to be accurate through other smaller models when it is actually running. control. So there are these models.
The following are some commonly used model download addresses:

  1. of the state
  2. huggingface
  3. esheep (the connection is more stable to the first two)
  4. liblibai (the connection is more stable to the first two)
  5. openart

keyword, descriptive sentence

Prompt words: forward text prompt words, reverse text prompt words
The most important thing is to confirm what content you want to express, that is, the theme, because the content serves the theme

  1. Characters and Subject Characteristics
  2. Scene Features
  3. ambient light
  4. Supplementary, frame angle

In addition to the description of the content, some keywords for describing the image quality are also required

  1. high-definition
  2. specific high resolution
  3. Tips for style of painting (illustration, two-dimensional, realistic)

Prompt word weight: You can increase the weight by adding parentheses, or you can control it by adding specific multiples after keywords.
Prompt word website:

  1. dawnmark
  2. atoolbox

plugin management

After installing sd from github, you can start drawing. But in order to make the software more usable, we can install some plug-ins to make sd more usable

  1. Chinese Sinicization Language Plugin
  2. Gallery browser plugin (image broswer)
  3. Prompt word plugin
  4. Ultimate Upscale (enlarges the picture, appears as a script in Vincent's diagram)
  5. Local Latent Couple (LLUl) (local detail optimization plugin)
  6. after-detailer (helps beautify face or hand details)
  7. The controlNet
    plug-ins are basically managed on github, and can also be downloaded and installed through other code platforms such as gitee.

controlNet

controlNet is a plug-in for fine-grained control of the screen. Some metadata of the picture is needed, for example 通道信息, this information can be obtained through ps.

  1. Control your posture
  2. Control the depth of the picture
  3. Take control of the edges of the screen
  4. Control the light and shadow
  5. Generate Art QR Code

train the model yourself

If the above content is not enough to meet our requirements, then we can carry out model training by ourselves

  1. Find the data source, some photos of an object
  2. Tag the picture through sd, that is, identify the content on the screen. as a basis for learning
  3. give it to sd to learn
  • The data source should preferably be of uniform size
  • If the local graphics card is not enough, you can get three months of free PAI cloud service through aliyun
  • Use the project on github , Lora_Trainer on colab for data training
  • After the training is completed, you need to select the model with the best effect through the xyz script of sd

Guess you like

Origin blog.csdn.net/u013795102/article/details/132397823