Getting Started Guide to Stable Diffusion AI Painting
The most commonly used AI painting tools on the market areStable-Diffusion (SD) and Midjourney (Mid) , SD is a service that runs locally, open source, and has a high threshold, but < a i=7>Extremely high self-control. As an online service, Mid has a low threshold and good results, but is not controllable enough .
Stable Diffusion (referred to as SD) is the most popular free and open source AI drawing model today. can be run offline on the local computer< /span>. You can even specify areas for redrawing. SD users can train their own models and LORA as they wish, and even introduce ControlNet to use various tools to control the content of AI drawings (referred to as SDW) is a friendly Web graphical interface based on Stable Diffusion, which solves the trouble of using the command line to operate Stable Diffusion. . Stable Diffusion WebUI, many paid AI drawings use Stable Diffusion at the bottom layer. And
Recommended geek time course:http://gk.link/a/1276o
online experience
Stable Diffusion Demo, this is a simple trial version officially released. No need to log in, just prompt words, and then click the generate button.
Native installation
To run stable-diffusion-webui and models smoothly, sufficient video memory is required.The minimum configuration is 4GB video memory, and the basic configuration is 6GB video memory, which is recommended. Configure 12GB video memory. Computer memory should not be too small, preferably greater than 16GB
Integrated package
Baidu network disk download address
Autumn Leaves Stable Diffusion Integration Package v4.2 Tutorial
Install from source
- Anso Python 3.10.6
- Download WebUI source code:
git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui.git
- Modify startup parameters in webui-user.bat
set COMMANDLINE_ARGS= --xformers
- Copy the relevant main model and fine-tuned model to the specified directory.
- Run
webui-user.bat
, a Python virtual environment will be automatically created, dependency packages will be downloaded and installed, and it will take about 30 minutes. If it fails, it can be executed again.
- Click on the picture above
http://127.0.0.1:8960
. After opening, the interface will be as shown below. You can select the basic model and set parameters such as prompt words, and click the Generate button:
Vincent diagram parameters
Model
Different models will bring different painting styles and understand different concepts (characters/objects/actions...). This is the reason why there are so many models. Common models can be divided into two categories: main models, and small models used to fine-tune the main model. Common model suffix names are as follows: 1. ckpt, 2. pt, 3. pth, 4. safetensors. These suffix names are all standard models. It is impossible to determine the specific type of model from the suffix name.
Since it is very difficult to refine and finetune the main model, which requires good graphics cards and computing power, more people choose to refine small models. These small models can be easily modified by acting on different parts of the large model. large model to achieve customization purposes. Common small models used for fine-tuning are divided into the following types: Textual inversion (often called Embedding model), Hypernetwork model, and LoRA model.
master model
The main model has the greatest impact on AI painting. SD series, such as sd-v1-4, sd-v1-5, sd-v2 (abbreviated as SD1.5, SD2.0) and other large models, these are Stable -The large model that comes with Diffusion. Basically no one will use the large model that comes with it, because the effect is very poor. If you want to practice large models by yourself, the SD series are good basic models because they are relatively complex and come in all styles, and they are neutral models.
Currently, the more popular and common checkpoint models include Anything series, ChilloutMix, Deliberate, Guofeng series, etc. These checkpoint models are trained from the Stable Diffusion base model. Currently, most models are trained from v1.4 or v1.5. They are trained using other data to generate images of a specific style or object.
The model file name containing pruned refers to the complete version, and emaonly refers to the pruned version. The pruned version is usually much smaller than the full version, making it easier to download. If you just use it, there is not much difference between the two. If you want to practice the model yourself, you need to download the full version.
- The Anything series is a model featuring two-dimensional comics.
- Cetus-Mix is a two-dimensional mixing model.
- Chilloutmix is a famous Asian beauty model. The large number of AI beauties you see are basically generated by this model.
- The latest version of the Deliberate series is deliberate_v2. This model is a universal model that can draw anything you want. The style favors oil painting and digital painting. It is worth noting that the prompt words for this model must be filled in in great detail.
- The Realistic Vision series is a realistic style model, which is more suitable for people and animals, but it is also relatively versatile.
- PerfectWorld is the European and American version of Chilloutmix, which mainly draws European and American-style beauties, with a 2.5D bias between animation and realism.
- GuoFeng is a gorgeous Chinese ancient style model, which can also be said to be an ancient style game character model, with a 2.5D texture.
LoRA
- File suffix: .ckpt, .safetensors, .pt
- Storage path: models/Lora
The most popular fine-tuning model at the moment can fix the style of a certain type of person or thing. If a certain LORA is used, the style will be close to it. They are usually 10-200 MB. Must be used with checkpoint model. The popular Korean Doll Likeness, Taiwan Doll Likenes, and Cute Girl mix are all real-life beauty LoRA models, and the effects are amazing. There are also some specific styles of LoRA that are also very popular, the most famous of which are Mo Xin and so on.
Model trainer: https://github.com/Akegarasu/lora-scripts
VAE beautification model/variational autoencoder
- File suffix: .pt
- Storage path: models/VAE
VAE, full name Variational autoenconder, is called variational autoencoder in Chinese. The function is: filter + fine-tuning.
Some large models will come with VAE, such as Chilloutmix. If VAE is added, the picture effect may not be better, or even be counterproductive. The default VAE is animevae, which has average effects. It is recommended to use kl-f8-anime2 or vae-ft-mse-840000-ema-pruned. anime2 is suitable for drawing two-dimensional characters, and 840000 is suitable for drawing realistic characters.
Embedding/Textual lnversion
- File suffix: .safetensors, .pt
- Storage path: embeddings
Textual lnversion is called text inversion in Chinese. It can teach new concepts to the model by using only a few images. for personalized image generation. Embeddings are small files that define new keywords to generate new characters or image styles. They are small, usually 10-100 KB. They must be used with checkpoint models. For example, the EasyNegative Embeddings contains a large number of negative words, which can reduce the pain of typing a bunch of negative words every time.
Model download
- C station mirror: https://civitai.space/
- https://www.liblibai.com/
- http://www.i-desai.com/
- https://www.ai016.com/
- https://tusi.art
ControlNet
ControlNet is a plug-in for controlling AI image generation. Before the emergence of ControlNet, we never knew what AI could generate for us before we generated pictures. It was just like drawing cards and relying on luck. After the emergence of ControlNet, we can accurately control image generation through the model, such as color rendering, controlling the posture of characters, etc. The function of the prompt is to lay out the general picture of the entire picture, the function of Lora is to make the main body of the picture meet our needs, and the function of ControlNet is to finely control the elements of the overall picture - the main body, background, style, form, etc.
For example, if you provide a picture, you can choose to collect the skeleton of the person in the picture to generate a person with the same posture in a new picture. You can choose to collect the line drawing of the picture in the picture, so as to generate the same line drawing of the picture in the new picture. , you can choose to collect the existing style in the picture, so as to generate the same style picture in the new picture.
Reference:15 ControlNet models
Prompt word Prompt
Prompt word example
- 提示词:
solo, 1girl, portrait, looking at viewer, masterpiece, best quality, 8k, - 反向提示词:
(worst quality, low quality:1.4), (bad-image-v2-39000:0.75), (bad_prompt_v2:0.85), (censored, bar censor), cropped, mature,
universal prompt word
It is widely applicable to the two-dimensional style and can be used with different models!
- Add after the positive prompt word:
(masterpiece:1,2), best quality, masterpiece, highres, original, extremely detailed wallpaper, perfect lighting,(extremely detailed CG:1.2), drawing, paintbrush,
- Add after the negative prompt word:
NSFW, (worst quality:2), (low quality:2), (normal quality:2), lowres, normal quality, ((monochrome)), ((grayscale)), skin spots, acnes, skin blemishes, age spot, (ugly:1.331), (duplicate:1.331), (morbid:1.21), (mutilated:1.21), (tranny:1.331), mutated hands, (poorly drawn hands:1.5), blurry, (bad anatomy:1.21), (bad proportions:1.331), extra limbs, (disfigured:1.331), (missing arms:1.331), (extra legs:1.331), (fused fingers:1.61051), (too many fingers:1.61051), (unclear eyes:1.331), lowers, bad hands, missing fingers, extra digit,bad hands, missing fingers, (((extra arms and legs))),
Counterfeit-V2.5 two-dimensional example
Prompt word:
(((masterpiece))),(((best quality))), ((ultra-detailed)), (best illustration), 1girl, solo, blush, smug, smile, purple eyes, choker, gradient eyes, no pupils, multicolored_hair, pink hair, blue hair, long hair,<lora:sangonomiyaKokomi_v10:0.5>, sangonomiya kokomi, ((kimono)), outdoors, sakura trees, sakura, facing towards viewer, front view
Negative cue words:
EasyNegative,extra fingers, fewer fingers, extreme fingers,wrong hand,wrong tail, missing male, extra legs, extra arms, missing legs, missing arms, weird legs, weird arms, watermark, logo, long hand, (poorly drawn hands:1.331), (bad anatomy:1.21), (bad proportions:1.331), (fused fingers:1.61051), (too many fingers:1.61051), extra digit, fewer digits,(mutated hands and fingers:1.5 ), fused fingers, one hand with more than 5 fingers, one hand with less than 5 fingers, one hand with morethan 5 digit, one hand with less than 5 digit, extra digit, fewer digits, fused digit, missing digit,text,watermark,
parameter:
Size: 512x512, Seed: 1396898128, Model: CounterfeitV25_25, Steps: 20, Sampler: DPM++ 2S a Karras, CFG scale: 7, Model hash: a074b8864e, Hires steps: 20, Hires upscale: 2, Hires upscaler: Latent (nearest-exact), Denoising strength: 0.7
tool
- Parser: https://spell.novelai.dev/, which can be used to view model file types and parameters when generating image files.
- Prompter: https://prompt.qpipi.com/, to help write prompt words
course
- CuteCat: https://space.bilibili.com/3493136342977164/channel/collectiondetail?sid=1261907
- Nenly Doujin: BV1Fu4y1o7F1
- Ouyang River: BV1ms4y1y7Mx
- SD inscription skills: BV1Fu4y1o7F1
- Use GPT smoothly: BV13s4y1v7BE
- SD HD amplification: BV1Ch4y147WE
- Chinese character art poster: BV1fh4y1u7x9
- Currently training the character model without thinking about LoRA: https://www.bilibili.com/video/BV1fs4y1x7p2/
- Recommended training large model for training style model: https://www.bilibili.com/video/BV1SR4y1y7Lv/