Getting Started Guide to Stable Diffusion AI Painting

Getting Started Guide to Stable Diffusion AI Painting

The most commonly used AI painting tools on the market areStable-Diffusion (SD) and Midjourney (Mid) , SD is a service that runs locally, open source, and has a high threshold, but < a i=7>Extremely high self-control. As an online service, Mid has a low threshold and good results, but is not controllable enough .

Stable Diffusion (referred to as SD) is the most popular free and open source AI drawing model today. can be run offline on the local computer< /span>. You can even specify areas for redrawing. SD users can train their own models and LORA as they wish, and even introduce ControlNet to use various tools to control the content of AI drawings (referred to as SDW) is a friendly Web graphical interface based on Stable Diffusion, which solves the trouble of using the command line to operate Stable Diffusion. . Stable Diffusion WebUI, many paid AI drawings use Stable Diffusion at the bottom layer. And

Recommended geek time course:http://gk.link/a/1276o

online experience

Stable Diffusion Demo, this is a simple trial version officially released. No need to log in, just prompt words, and then click the generate button.

Native installation

To run stable-diffusion-webui and models smoothly, sufficient video memory is required.The minimum configuration is 4GB video memory, and the basic configuration is 6GB video memory, which is recommended. Configure 12GB video memory. Computer memory should not be too small, preferably greater than 16GB

Integrated package

Baidu network disk download address

Autumn Leaves Stable Diffusion Integration Package v4.2 Tutorial

Install from source

  1. Anso Python 3.10.6
  2. Download WebUI source code:git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui.git
  3. Modify startup parameters in webui-user.bat
set COMMANDLINE_ARGS=  --xformers
  1. Copy the relevant main model and fine-tuned model to the specified directory.
  2. Run webui-user.bat, a Python virtual environment will be automatically created, dependency packages will be downloaded and installed, and it will take about 30 minutes. If it fails, it can be executed again.
    Insert image description here
  3. Click on the picture abovehttp://127.0.0.1:8960. After opening, the interface will be as shown below. You can select the basic model and set parameters such as prompt words, and click the Generate button:
    Please add image description

Vincent diagram parameters

parameter illustrate
Prompt Prompt word (positive)
Negative prompt Negative cue words (reverse)
Width & Height The size of the image to generate. The larger the size, the more performance is consumed and the longer it takes.
CFG scale The AI's preference for drawing parameters (Prompt). The smaller the value, the more the generated picture violates your description, but the more logical it is; the larger the value, the more the generated picture fits your description, but it may not be logical.
Sampling method Sampling method. There are many kinds, but they only differ in sampling algorithms. There is no good or bad distinction. Just choose the appropriate one.
Sampling steps Sampling step size. If it is too small, the randomness of sampling will be very high; if it is too large, the sampling power will be very low and the rejection probability will be high (it can be understood that no sample is obtained and the sampling result is discarded).
Seed Random number seed. A random seed is generated for each image. This seed is used as the basis for confirming the initial state of dispersion. If you don’t understand, just use random.

Model

Different models will bring different painting styles and understand different concepts (characters/objects/actions...). This is the reason why there are so many models. Common models can be divided into two categories: main models, and small models used to fine-tune the main model. Common model suffix names are as follows: 1. ckpt, 2. pt, 3. pth, 4. safetensors. These suffix names are all standard models. It is impossible to determine the specific type of model from the suffix name.

Since it is very difficult to refine and finetune the main model, which requires good graphics cards and computing power, more people choose to refine small models. These small models can be easily modified by acting on different parts of the large model. large model to achieve customization purposes. Common small models used for fine-tuning are divided into the following types: Textual inversion (often called Embedding model), Hypernetwork model, and LoRA model.

master model

The main model has the greatest impact on AI painting. SD series, such as sd-v1-4, sd-v1-5, sd-v2 (abbreviated as SD1.5, SD2.0) and other large models, these are Stable -The large model that comes with Diffusion. Basically no one will use the large model that comes with it, because the effect is very poor. If you want to practice large models by yourself, the SD series are good basic models because they are relatively complex and come in all styles, and they are neutral models.

Currently, the more popular and common checkpoint models include Anything series, ChilloutMix, Deliberate, Guofeng series, etc. These checkpoint models are trained from the Stable Diffusion base model. Currently, most models are trained from v1.4 or v1.5. They are trained using other data to generate images of a specific style or object.

The model file name containing pruned refers to the complete version, and emaonly refers to the pruned version. The pruned version is usually much smaller than the full version, making it easier to download. If you just use it, there is not much difference between the two. If you want to practice the model yourself, you need to download the full version.

  • The Anything series is a model featuring two-dimensional comics.
  • Cetus-Mix is ​​a two-dimensional mixing model.
  • Chilloutmix is ​​a famous Asian beauty model. The large number of AI beauties you see are basically generated by this model.
  • The latest version of the Deliberate series is deliberate_v2. This model is a universal model that can draw anything you want. The style favors oil painting and digital painting. It is worth noting that the prompt words for this model must be filled in in great detail.
  • The Realistic Vision series is a realistic style model, which is more suitable for people and animals, but it is also relatively versatile.
  • PerfectWorld is the European and American version of Chilloutmix, which mainly draws European and American-style beauties, with a 2.5D bias between animation and realism.
  • GuoFeng is a gorgeous Chinese ancient style model, which can also be said to be an ancient style game character model, with a 2.5D texture.

LoRA

  • File suffix: .ckpt, .safetensors, .pt
  • Storage path: models/Lora

The most popular fine-tuning model at the moment can fix the style of a certain type of person or thing. If a certain LORA is used, the style will be close to it. They are usually 10-200 MB. Must be used with checkpoint model. The popular Korean Doll Likeness, Taiwan Doll Likenes, and Cute Girl mix are all real-life beauty LoRA models, and the effects are amazing. There are also some specific styles of LoRA that are also very popular, the most famous of which are Mo Xin and so on.

Model trainer: https://github.com/Akegarasu/lora-scripts

VAE beautification model/variational autoencoder

  • File suffix: .pt
  • Storage path: models/VAE

VAE, full name Variational autoenconder, is called variational autoencoder in Chinese. The function is: filter + fine-tuning.

Some large models will come with VAE, such as Chilloutmix. If VAE is added, the picture effect may not be better, or even be counterproductive. The default VAE is animevae, which has average effects. It is recommended to use kl-f8-anime2 or vae-ft-mse-840000-ema-pruned. anime2 is suitable for drawing two-dimensional characters, and 840000 is suitable for drawing realistic characters.

Embedding/Textual lnversion

  • File suffix: .safetensors, .pt
  • Storage path: embeddings

Textual lnversion is called text inversion in Chinese. It can teach new concepts to the model by using only a few images. for personalized image generation. Embeddings are small files that define new keywords to generate new characters or image styles. They are small, usually 10-100 KB. They must be used with checkpoint models. For example, the EasyNegative Embeddings contains a large number of negative words, which can reduce the pain of typing a bunch of negative words every time.

Model download

ControlNet

ControlNet is a plug-in for controlling AI image generation. Before the emergence of ControlNet, we never knew what AI could generate for us before we generated pictures. It was just like drawing cards and relying on luck. After the emergence of ControlNet, we can accurately control image generation through the model, such as color rendering, controlling the posture of characters, etc. The function of the prompt is to lay out the general picture of the entire picture, the function of Lora is to make the main body of the picture meet our needs, and the function of ControlNet is to finely control the elements of the overall picture - the main body, background, style, form, etc.

For example, if you provide a picture, you can choose to collect the skeleton of the person in the picture to generate a person with the same posture in a new picture. You can choose to collect the line drawing of the picture in the picture, so as to generate the same line drawing of the picture in the new picture. , you can choose to collect the existing style in the picture, so as to generate the same style picture in the new picture.

Reference:15 ControlNet models

Prompt word Prompt

Prompt word example

  • 提示词:
    solo, 1girl, portrait, looking at viewer, masterpiece, best quality, 8k,
  • 反向提示词:
    (worst quality, low quality:1.4), (bad-image-v2-39000:0.75), (bad_prompt_v2:0.85), (censored, bar censor), cropped, mature,

universal prompt word

It is widely applicable to the two-dimensional style and can be used with different models!

  • Add after the positive prompt word:

(masterpiece:1,2), best quality, masterpiece, highres, original, extremely detailed wallpaper, perfect lighting,(extremely detailed CG:1.2), drawing, paintbrush,

  • Add after the negative prompt word:

NSFW, (worst quality:2), (low quality:2), (normal quality:2), lowres, normal quality, ((monochrome)), ((grayscale)), skin spots, acnes, skin blemishes, age spot, (ugly:1.331), (duplicate:1.331), (morbid:1.21), (mutilated:1.21), (tranny:1.331), mutated hands, (poorly drawn hands:1.5), blurry, (bad anatomy:1.21), (bad proportions:1.331), extra limbs, (disfigured:1.331), (missing arms:1.331), (extra legs:1.331), (fused fingers:1.61051), (too many fingers:1.61051), (unclear eyes:1.331), lowers, bad hands, missing fingers, extra digit,bad hands, missing fingers, (((extra arms and legs))),

Counterfeit-V2.5 two-dimensional example

Prompt word:

(((masterpiece))),(((best quality))), ((ultra-detailed)), (best illustration), 1girl, solo,  blush, smug, smile, purple eyes, choker, gradient eyes, no pupils, multicolored_hair, pink hair, blue hair, long hair,<lora:sangonomiyaKokomi_v10:0.5>, sangonomiya kokomi, ((kimono)), outdoors, sakura trees, sakura, facing towards viewer, front view

Negative cue words:

EasyNegative,extra fingers, fewer fingers, extreme fingers,wrong hand,wrong tail, missing male, extra legs, extra arms, missing legs, missing arms, weird legs, weird arms, watermark, logo, long hand, (poorly drawn hands:1.331), (bad anatomy:1.21), (bad proportions:1.331), (fused fingers:1.61051), (too many fingers:1.61051), extra digit, fewer digits,(mutated hands and fingers:1.5 ), fused fingers, one hand with more than 5 fingers, one hand with less than 5 fingers, one hand with morethan 5 digit, one hand with less than 5 digit, extra digit, fewer digits, fused digit, missing digit,text,watermark,

parameter:

Size: 512x512, Seed: 1396898128, Model: CounterfeitV25_25, Steps: 20, Sampler: DPM++ 2S a Karras, CFG scale: 7, Model hash: a074b8864e, Hires steps: 20, Hires upscale: 2, Hires upscaler: Latent (nearest-exact), Denoising strength: 0.7

tool

  • Parser: https://spell.novelai.dev/, which can be used to view model file types and parameters when generating image files.
  • Prompter: https://prompt.qpipi.com/, to help write prompt words

course

Akiba aaaki

  • Currently training the character model without thinking about LoRA: https://www.bilibili.com/video/BV1fs4y1x7p2/
  • Recommended training large model for training style model: https://www.bilibili.com/video/BV1SR4y1y7Lv/

Supongo que te gusta

Origin blog.csdn.net/LifeRiver/article/details/131629249
Recomendado
Clasificación