Stable Diffusion series of courses: installation, introduction to prompt words, common models (checkpoint, embedding, LORA), amplification algorithm, partial redrawing, common plug-ins

1. Introduction and installation of Stable Diffusion

Recommended installation:

2. Wen Sheng Diagram (Analysis of Prompt Words)

  The prompt word must be in English, it is very long and has many symbols, just like an inscrutable spell, so everyone vividly calls the process of writing the prompt word "chanting the mantra". The model often doesn't know what we want, and at this time it is necessary to use the prompt to instruct and guide. In Stable Diffusion, whether it is a Vincentian diagram or a graph-generated diagram, prompt is required, which is the basis of everything.

insert image description here

2.1 Getting Started with Prompt Words

  The prompt words must be in English. If English is not good, you can only turn to translation software or some plug-ins for help. The writing of prompt words does not need to follow the grammatical structure of a complete sentence, but it is also possible to pile up some phrases, and the effect will be better. For example, drawing "a long and wide noodle, a big and round bowl", written as (face, length, width), (bowl, big, round) is also OK, and the effect may be better.

  1. Use delimiters: Delimiters should be used to separate prompt words and phrases. In the underlying code, they are all written in English, and the commonly used style delimiter is English comma (half-width).
  2. Newline: Prompt words can be wrapped, but it is best to separate the end of the line with a comma
  3. The generation of high-quality pictures requires detailed content and clear quality standard prompt words
    A girl is walking in the forest (a girl is walking in the forest), which can be written 1 girl,walking,forest,path,sun,sunshine,shiing in body. But the pictures generated in this way are far from the effect we expect. This is because the image generated by the model (Stable Diffusion) is random, and the description of "a girl walking in the forest" is too general. I don’t know the girl’s shape, clothing, angle of view, what it looks like in the forest, etc. These models can only be blind, so the final effect is not good. We can slowly refine, fine-tune and supplement.
    insert image description here
    Classification of prompt words: prompt words can be classified according to the following categories, which is convenient for matching and supplementing when writing.
  • Content-based prompt words : make targeted modifications according to your own needs
    • The main characteristics of the characters : the more specific the AI, the clearer the thinking. Adjectives such as beautiful, happy and other abstract words can also affect the overall feeling
    • Scene features : It is best to add outdoor prompts for outdoor scenes, and use indoor indoors, which can significantly affect the atmosphere of the picture
    • ambient light
    • Frame angle of view : Close up can be used for close-ups, and full body can be used for medium distances

insert image description here

  • Standard prompt words : relatively fixed, you can copy homework.
    If there are only content-type prompt words, the generated pictures will most likely be unsatisfactory. , so that the generated image is closer to a fixed standard. (Different styles of pictures also depend on the pre-trained model)
    insert image description here

  • frame angle

    • distance close-up, distant
    • Character proportion full body, upper body
    • Observation angle from above, view of back
    • Lens type wide angle, Sony A7I
  • Style prompt words

    • illustration,painting,paintbrush
    • Two-dimensional anime comic, game CG
    • Photorealistic, realistic, photograph
  • Universal HD

    • best quality, ultra-detailed, masterpiece,hires,8k
  • Specific high resolution type

    • extremely detailed CG unity 8k wallpaper (super fine 8K Unity game CG), unreal engine rendered (unreal engine rendering)

2.2 Weight

  Just now we added white flower to the prompt, but it is not in the generated picture. This is because there are so many words in the prompt, so the model may not get what you want. So at this time, you can use the weight to adjust, there are two ways:

  • English brackets indicate : each layer []means weight × 0.9 (decrease), {}means × 1.05, ()means × 1.1. So if (((white flower)))it means white, the weight is 1.331.;
  • English brackets + numbers indicate : (white flower:1.5)the weight of the white flower is 1.5.

insert image description here
  The safe range of prompt word weight is generally around 1±0.5, too high will easily distort the content of the screen. At this time, we want more elements of Hundred Flowers, and we can use more types of entries to achieve synergistic effects. Advanced rules for prompt words - mixing, migration, iteration, etc., will be added later.
insert image description here

insert image description here


There are also some additional rules:

  • The closer the prompt is, the greater the weight; for example, if the scenery is in the front, the character will be small, and on the contrary, the character will become larger or half-length
  • The bigger the picture, the more prompts you need, otherwise the prompts will pollute each other
  • Prompt supports the use of emoji, and the expressiveness is better.

2.3 Negative prompts

  Negative prompt is used to specify content that you do not want to generate. Using Negative prompt can eliminate common deformities of Stable Diffusion, such as redundant limbs. The sampler (sampler) will compare the difference between the image generated by the prompt and the image generated by the negative prompt, and the final generated result is close to the former and far away from the latter. The following is an example:

  • The original image is foggy and grainy (low quality)
  • Negative prompt is fog: the fog is gone but a strange purple color is generated
  • Negative prompt is grainy: no fog and purple, but the color is monotonous
  • Negative prompt is fog, grainy, purple: no fog and purple, high image quality, and high color saturation.
insert image description here insert image description here
negative prompt:None negative prompt:fog
insert image description here insert image description here
negative prompt:grainy negative prompt:fog, grainy, purple

General template of reverse prompt words for plotting:
insert image description here

2.4 Plot parameter setting

  • Sampling iteration steps 20-50 :
      Stable Diffusion generates images by adding noise to the image and then denoising the image. After adding noise to the image, the model has more room to play. The denoising process can be understood as the model will use pixels to generate the picture you need bit by bit. When the picture is generated, each time the picture flashes, it means that the model has iterated a step.
      Theoretically, the more steps, the clearer the picture quality. But in fact, after the number of steps is greater than 20, the improvement is not obvious, and the larger the number of steps, the longer the calculation time and the greater the memory consumption. The default is 20, and the high-definition requirement can be set to 30-50, and the quality below 10 is scary.
    insert image description here
    insert image description here
  • Sampling method: It can be understood as the algorithm when the model generates pictures, choose + sign or model recommendation
    • Euler and Eular a: illustration style, relatively simple
    • DPM 2M and DPM 2M Karras: faster rendering
    • DPM++ SDE Karras: rich in detail

  In actual use, the sampling method with the + sign in the figure is recommended, because all of them have been improved. In addition, many models have recommended sampling methods, which are generally the best in the author's test.

insert image description here

  • Resolution: About 1024×1024 is recommended, high-definition can be used for high-definition repair.
      Too low details and insufficient quality, too high may cause insufficient video memory, and there may be a picture of many people with multiple hands and feet. Because AI training, the resolution will not be too high, too high resolution AI will be considered as multi-image splicing. If you really need high image quality, you can generate low resolution first, and then use high-definition restoration to enlarge it. The essence is to generate images from pictures, which will be discussed later.
    insert image description here
    insert image description here
  • Facial repair: generally choose
  • Tiling: not recommended
  • Cue Word Relevance: The degree to which the cue word affects the generated image. When the correlation is high, the generated image will be more in line with the appearance of the prompt information; on the contrary, if the correlation of the prompt word is low and the corresponding weight is also small, the generated image will be more random
    • For the reminder words of people, generally control the relevance of the reminder words between 7-12, if it is too high, the picture will be easily deformed
    • For prompt words in large scenes such as buildings, generally control 3-7around . This allows some degree of randomness to be highlighted without compromising the visualization of the generated image.
  • Random seed: the dice button means random (-1) every time, and the triangle cycle button means copy the last random seed value
  • Generate Batch and Batch Quantity: Used to generate multiple graphs at a time. The number of batches increases, and the consumption of video memory also increases. It is not recommended to change it when generating high-definition images.

2.5 How to chant spells for beginners

For more methods, please refer to AI Art Paradise

  1. Natural language: Chinese expressions are translated into English. translation plugin
  2. Prompt word tool:
  3. Copy homework: Civitai (magic) , libilibi , Alchemy Pavilion , DesAi , openart (partially European and American) , and arthub (partially Asian) have many excellent pictures for reference. In addition, there is an interior design tag integration table ,
  4. Chatgpt generates prompt words: first write the following paragraph in chatgpt:
- Reference guide of what is Stable Diffusion and how to Prompt -

Stable Diffusion is a deep learning model for generating images based on text descriptions and can be applied to inpainting, outpainting, and image-to-image translations guided by text prompts. Developing a good prompt is essential for creating high-quality images.

A good prompt should be detailed and specific, including keyword categories such as subject, medium, style, artist, website, resolution, additional details, color, and lighting. Popular keywords include "digital painting," "portrait," "concept art," "hyperrealistic," and "pop-art." Mentioning a specific artist or website can also strongly influence the image's style. For example, a prompt for an image of Emma Watson as a sorceress could be: "Emma Watson as a powerful mysterious sorceress, casting lightning magic, detailed clothing, digital painting, hyperrealistic, fantasy, surrealist, full body."

Artist names can be used as strong modifiers to create a specific style by blending the techniques of multiple artists. Websites like Artstation and DeviantArt offer numerous images in various genres, and incorporating them in a prompt can help guide the image towards these styles. Adding details such as resolution, color, and lighting can enhance the image further.

Building a good prompt is an iterative process. Start with a simple prompt including the subject, medium, and style, and then gradually add one or two keywords to refine the image.

Association effects occur when certain attributes are strongly correlated. For instance, specifying eye color in a prompt might result in specific ethnicities being generated. Celebrity names can also carry unintended associations, affecting the pose or outfit in the image. Artist names, too, can influence the generated images.

In summary, Stable Diffusion is a powerful deep learning model for generating images based on text descriptions. It can also be applied to inpainting, outpainting, and image-to-image translations guided by text prompts. Developing a good prompt is essential for generating high-quality images, and users should carefully consider keyword categories and experiment with keyword blending and negative prompts. By understanding the intricacies of the model and its limitations, users can unlock the full potential of Stable Diffusion to create stunning, unique images tailored to their specific needs.

--

Please use this information as a reference for the task you will ask me to do after.

--

Below is a list of prompts that can be used to generate images with Stable Diffusion.

- Examples -

"masterpiece, best quality, high quality, extremely detailed CG unity 8k wallpaper, The vast and quiet taiga stretches to the horizon, with dense green trees grouped in deep harmony, as the fresh breeze whispers through their leaves and crystal snow lies on the frozen ground, creating a stunning and peaceful landscape, Bokeh, Depth of Field, HDR, bloom, Chromatic Aberration, Photorealistic, extremely detailed, trending on artstation, trending on CGsociety, Intricate, High Detail, dramatic, art by midjourney"

"a painting of a woman in medieval knight armor with a castle in the background and clouds in the sky behind her, (impressionism:1.1), ('rough painting style':1.5), ('large brush texture':1.2), ('palette knife':1.2), (dabbing:1.4), ('highly detailed':1.5), professional majestic painting by Vasily Surikov, Victor Vasnetsov, (Konstantin Makovsky:1.3), trending on ArtStation, trending on CGSociety, Intricate, High Detail, Sharp focus, dramatic"

"masterpiece, best quality, high quality, extremely detailed CG unity 8k wallpaper,flowering landscape, A dry place like an empty desert, dearest, foxy, Mono Lake, hackberry,3D Digital Paintings, award winning photography, Bokeh, Depth of Field, HDR, bloom, Chromatic Aberration, Photorealistic, extremely detailed, trending on artstation, trending on CGsociety, Intricate, High Detail, dramatic, art by midjourney"

"portrait of french women in full steel knight armor, highly detailed, heart professional majestic oil painting by Vasily Surikov, Victor Vasnetsov, Konstantin Makovsky, trending on ArtStation, trending on CGSociety, Intricate, High Detail, Sharp focus, dramatic, photorealistic"

"(extremely detailed CG unity 8k wallpaper), full shot photo of the most beautiful artwork of a medieval castle, snow falling, nostalgia, grass hills, professional majestic oil painting by Ed Blinkey, Atey Ghailan, Studio Ghibli, by Jeremy Mann, Greg Manchess, Antonio Moro, trending on ArtStation, trending on CGSociety, Intricate, High Detail, Sharp focus, dramatic, photorealistic painting art by midjourney and greg rutkowski"

"micro-details, fine details, a painting of a fox, fur, art by Pissarro, fur, (embossed painting texture:1.3), (large brush strokes:1.6), (fur:1.3), acrylic, inspired in a painting by Camille Pissarro, painting texture, micro-details, fur, fine details, 8k resolution, majestic painting, artstation hd, detailed painting, highres, most beautiful artwork in the world, highest quality, texture, fine details, painting masterpiece"

"(8k, RAW photo, highest quality), beautiful girl, close up, t-shirt, (detailed eyes:0.8), (looking at the camera:1.4), (highest quality), (best shadow), intricate details, interior, (ponytail, ginger hair:1.3), dark studio, muted colors, freckles"

"(dark shot:1.1), epic realistic, broken old boat in big storm, illustrated by herg, style of tin tin comics, pen and ink, female pilot, art by greg rutkowski and artgerm, soft cinematic light, adobe lightroom, photolab, hdr, intricate, highly detailed, (depth of field:1.4), faded, (neutral colors:1.2), (hdr:1.4), (muted colors:1.2), hyperdetailed, (artstation:1.4), cinematic, warm lights, dramatic light, (intricate details:1.1), complex background, (rutkowski:0.66), (teal and orange:0.4), (intricate details:1.12), hdr, (intricate details, hyperdetailed:1.15)"

"Architectural digest photo of a maximalist green solar living room with lots of flowers and plants, golden light, hyperrealistic surrealism, award winning masterpiece with incredible details, epic stunning pink surrounding and round corners, big windows"

- Explanation -

The following elements are a description of the prompt structure. You should not include the label of a section like "Scene description:".

Scene description: A short, clear description of the overall scene or subject of the image. This could include the main characters or objects in the scene, as well as any relevant background.

Modifiers: A list of words or phrases that describe the desired mood, style, lighting, and other elements of the image. These modifiers should be used to provide additional information to the model about how to generate the image, and can include things like "dark, intricate, highly detailed, sharp focus, Vivid, Lifelike, Immersive, Flawless, Exquisite, Refined, Stupendous, Magnificent, Superior, Remarkable, Captivating, Wondrous, Enthralling, Unblemished, Marvelous, Superlative, Evocative, Poignant, Luminous, Crystal-clear, Superb, Transcendent, Phenomenal, Masterful, elegant, sublime, radiant, balanced, graceful, 'aesthetically pleasing', exquisite, lovely, enchanting, polished, refined, sophisticated, comely, tasteful, charming, harmonious, well-proportioned, well-formed, well-arranged, smooth, orderly, chic, stylish, delightful, splendid, artful, symphonious, harmonized, proportionate".

Artist or style inspiration: A list of artists or art styles that can be used as inspiration for the image. This could include specific artists, such as "by artgerm and greg rutkowski, Pierre Auguste Cot, Jules Bastien-Lepage, Daniel F. Gerhartz, Jules Joseph Lefebvre, Alexandre Cabanel, Bouguereau, Jeremy Lipking, Thomas Lawrence, Albert Lynch, Sophie Anderson, Carle Van Loo, Roberto Ferri" or art movements, such as "Bauhaus cubism."

Technical specifications: Additional information that evoke quality and details. This could include things like: "4K UHD image, cinematic view, unreal engine 5, Photorealistic, Realistic, High-definition, Majestic, hires, ultra-high resolution, 8K, high quality, Intricate, Sharp, Ultra-detailed, Crisp, Cinematic, Fine-tuned"

- Prompt Structure -

The structure sequence can vary. However, the following is a good reference:

[Scene description]. [Modifiers], [Artist or style inspiration], [Technical specifications]

- Special Modifiers -

In the examples you can notice that some terms are closed between (). That instructes the Generative Model to take more attention to this words. If there are more (()) it means more attention.

Similarly, you can find a structure like this (word:1.4). That means this word will evoke more attention from the Generative Model. The number "1.4" means 140%. Therefore, if a word whitout modifiers has a weight of 100%, a word as in the example (word:1.4), will have a weight of 140%.

You can also use these notations to evoke more attention to specific words.

- Your Task -

Based on the examples and the explanation of the structure, you will create 5 prompts. In my next requests, I will use the command /Theme: [ description of the theme]. Then, execute your task based on the description of the theme.

--

Acknowledge that you understood the instructions

or it could be:

# Stable Diffusion prompt 助理

你来充当一位有艺术气息的Stable Diffusion prompt 助理。

## 任务

我用自然语言告诉你要生成的prompt的主题,你的任务是根据这个主题想象一幅完整的画面,然后转化成一份详细的、高质量的prompt,让Stable Diffusion可以生成高质量的图像。

## 背景介绍

Stable Diffusion是一款利用深度学习的文生图模型,支持通过使用 prompt 来产生新的图像,描述要包含或省略的元素。

## prompt 概念

- 完整的prompt包含“**Prompt:**”和"**Negative Prompt:**"两部分。
- prompt 用来描述图像,由普通常见的单词构成,使用英文半角","做为分隔符。
- negative prompt用来描述你不想在生成的图像中出现的内容。
- 以","分隔的每个单词或词组称为 tag。所以prompt和negative prompt是由系列由","分隔的tag组成的。

## () 和 [] 语法

调整关键字强度的等效方法是使用 () 和 []。 (keyword) 将tag的强度增加 1.1 倍,与 (keyword:1.1) 相同,最多可加三层。 [keyword] 将强度降低 0.9 倍,与 (keyword:0.9) 相同。

## Prompt 格式要求

下面我将说明 prompt 的生成步骤,这里的 prompt 可用于描述人物、风景、物体或抽象数字艺术图画。你可以根据需要添加合理的、但不少于5处的画面细节。

### 1. prompt 要求

- 你输出的 Stable Diffusion prompt 以“**Prompt:**”开头。
- prompt 内容包含画面主体、材质、附加细节、图像质量、艺术风格、色彩色调、灯光等部分,但你输出的 prompt 不能分段,例如类似"medium:"这样的分段描述是不需要的,也不能包含":"和"."。
- 画面主体:不简短的英文描述画面主体, 如 A girl in a garden,主体细节概括(主体可以是人、事、物、景)画面核心内容。这部分根据我每次给你的主题来生成。你可以添加更多主题相关的合理的细节。
- 对于人物主题,你必须描述人物的眼睛、鼻子、嘴唇,例如'beautiful detailed eyes,beautiful detailed lips,extremely detailed eyes and face,longeyelashes',以免Stable Diffusion随机生成变形的面部五官,这点非常重要。你还可以描述人物的外表、情绪、衣服、姿势、视角、动作、背景等。人物属性中,1girl表示一个女孩,2girls表示两个女孩。
- 材质:用来制作艺术品的材料。 例如:插图、油画、3D 渲染和摄影。 Medium 有很强的效果,因为一个关键字就可以极大地改变风格。
- 附加细节:画面场景细节,或人物细节,描述画面细节内容,让图像看起来更充实和合理。这部分是可选的,要注意画面的整体和谐,不能与主题冲突。
- 图像质量:这部分内容开头永远要加上“(best quality,4k,8k,highres,masterpiece:1.2),ultra-detailed,(realistic,photorealistic,photo-realistic:1.37)”, 这是高质量的标志。其它常用的提高质量的tag还有,你可以根据主题的需求添加:HDR,UHD,studio lighting,ultra-fine painting,sharp focus,physically-based rendering,extreme detail description,professional,vivid colors,bokeh。
- 艺术风格:这部分描述图像的风格。加入恰当的艺术风格,能提升生成的图像效果。常用的艺术风格例如:portraits,landscape,horror,anime,sci-fi,photography,concept artists等。
- 色彩色调:颜色,通过添加颜色来控制画面的整体颜色。
- 灯光:整体画面的光线效果。

### 2. negative prompt 要求
- negative prompt部分以"**Negative Prompt:**"开头,你想要避免出现在图像中的内容都可以添加到"**Negative Prompt:**"后面。
- 任何情况下,negative prompt都要包含这段内容:"nsfw,(low quality,normal quality,worst quality,jpeg artifacts),cropped,monochrome,lowres,low saturation,((watermark)),(white letters)"
- 如果是人物相关的主题,你的输出需要另加一段人物相关的 negative prompt,内容为:“skin spots,acnes,skin blemishes,age spot,mutated hands,mutated fingers,deformed,bad anatomy,disfigured,poorly drawn face,extra limb,ugly,poorly drawn hands,missing limb,floating limbs,disconnected limbs,out of focus,long neck,long body,extra fingers,fewer fingers,,(multi nipples),bad hands,signature,username,bad feet,blurry,bad body”。

### 3. 限制:
- tag 内容用英语单词或短语来描述,并不局限于我给你的单词。注意只能包含关键词或词组。
- 注意不要输出句子,不要有任何解释。
- tag数量限制40个以内,单词数量限制在60个以内。
- tag不要带引号("")。
- 使用英文半角","做分隔符。
- tag 按重要性从高到低的顺序排列。
- 我给你的主题可能是用中文描述,你给出的prompt和negative prompt只用英文。

我的第一个主题是: 一个美丽的中国女孩

3. Graphics

3.1 Introduction to graph generation

  In Vincent diagram, we can tell the AI ​​model what image we want it to generate through some prompt words, but AI painting has a certain randomness, it may not get exactly what you want. At this time, if you give it a reference picture, AI can get more information from the picture and get your ideas more intuitively.
insert image description here

  1. Prompt Words
      to generate pictures also need prompt words to be specific and accurate . If you don't enter the prompt words at all, you will generally roll over. If you add more content-based prompts that describe details (short hair, blue eyes, beard, woolen hat, plaid shirt), plus some standardized positive and negative prompts, the effect of the picture will be better:
    insert image description here

  2. Redrawing range : a unique parameter of Tushengtu, the higher the level, the higher the degree of redrawing on the basis of the original picture.
    In the following example, we choose a real-life image, and then use the Abyss Orange model to generate the corresponding comic characters. At this time, the redrawing range is recommended to be 0.6-0.8, because the redrawing range is too high, and the character image may be deformed; if the redrawing range is too low, the effect will not be visible. The output effect is as follows:

insert image description here insert image description here
  1. Resolution : It is generally recommended to keep the same resolution as the original image
    • If the size of the original image is too large (such as 3000×3000), it can be scaled down proportionally;
    • If you just want to generate images with other aspect ratios, it is recommended to crop the original image before generating it.
      If the aspect ratio set is different from the original image, it will cause image distortion and stretching. There are also several scaling options below the picture for partial cropping. The last direct scaling requires a lot of video memory and is not recommended.
      insert image description here

3.2 Random seed analysis

  In the example above, after adding more detailed descriptions, the model generated a more similar image. But at this time, we found that the generated picture is an indoor scene, while the original picture is outdoor. At this time, some scene words (xx in the backgrounds) can be added for constraints, such as field, forest, travel, depth of field (the effect of blurring the background) and so on.

insert image description here

depth of field

  After adding scene words, we found that the shape of the whole character has also changed, which is due to the randomness of AI painting. If we want to keep the previous character image unchanged and only change the background, we only need to fix the random seed .
insert image description here

  The process of AI generating pictures is random, each generation will be randomly sampled, which is represented here as a set of random numbers (random seeds). If you choose the same set of random seeds, the generated pictures must have many similarities, because they are all generated by the same set of random methods.
insert image description here

  • Dice: The random seed is set to -1, which means that the randomness is different every time
  • Triangular loop: the random seed remains the same as last time

This time we used the random seed of the picture we liked before, and added the scene word, we can see that the background has been changed, and the character image is basically unchanged.

insert image description here

3.3 Expansion of graph-generated graphs

  1. Object anthropomorphism
    Import object pictures and input anthropomorphic prompt words to realize anthropomorphic objects and even landscapes
    insert image description here

  2. 3D transformation of two-dimensional characters
    Import the two-dimensional character map, use the 2.5D model, and constrain it with realistic texture standardized prompt words, and you can get an approximate 3D rendering. If you want a more specific and accurate conversion, you can use some lora.
    insert image description here

  3. Abstract painting
    Sometimes, we just scribble, and the model can also draw amazing results. (Drawing mode in the picture above)

Image 2 Image 1

4. Model

type suffix size storage path
checkpoint .ckptor.safetensors 2-7G/1-2G stable-diffusion-webui/models/Stable-diffusion
FEET .ptor.safetensor Hundreds of M stable-diffusion-webui/models/VAE
embeddings .ptor.safetensor hundreds of k stable-diffusion-webui/embeddings
hypernetwork .ptor.safetensor Hundreds of M stable-diffusion-webui/models/hypernetworks
LORA .ptor.safetensor Hundreds of M stable-diffusion-webui/models/lora

4.1 Checkpoint

4.1.1 Introduction to Checkpoint

  For model authors, training a model usually means generating a Checkpoint file. These files contain information such as model parameters and optimizer status, and are state snapshots saved periodically during training.

  For users, the Checkpoint file can be understood as a style filter, such as oil painting, comics, realistic style, etc. By selecting the corresponding Checkpoint file, you can convert the results generated by the Stable Diffusion model into a specific style of your choice. It should be noted that some Checkpoint files may need to be used with specific low-bit rate encoders (such as Lora) to obtain better results.

  When downloading the Checkpoint file, you can view the corresponding model introduction, usually the author will provide corresponding files and notes to help you better use and understand the file

  When the webui is opened, a new model file is added, just click to refresh. If the model is not loaded and the image is generated, it may cause an error.
insert image description here

4.1.2 Checkpoint classification and download

Checkpoint can be divided into three categories according to the style of painting:

insert image description here
  The effects of the officially released Stable Diffusion1.4/1.5/2.0/2.1models are relatively general, because of copyright constraints. Now everyone uses a lot, all of which are private furnace models (people generally call training AI models alchemy, because many of them are uncontrollable).
The current mainstream model download sites are:

4.2 VAE(Variational Autodecoder)

  VAEIt is responsible for converting the latent space data after adding noise into a normal image. It can be simply understood as a color correction filter for the model, which mainly affects the color texture of the picture . At present, most new models have integrated VAE in the file, and some authors will recommend suitable VAE in the model card.

  VAEThe file suffix is ​​generally .pt, or .safetensor, the storage path is. There is also a way to automatically load a specific model VAE, which is to put the VAE file models/Stable-diffusionin the folder, then change the file name to be consistent with the model name, and then .ptadd .vaea field before the suffix, so that you can choose to load the model automatically VAE up.
insert image description here
insert image description here
insert image description here

4.3 embeddings

On the C station or liblibai website, embeddings are Textual Inversionfiltered by tags.

  • badhandv4 : trigger wordbadhandv4
  • EasyNegativeV2 : For the training of the two-dimensional modeleasynegative , it solves a series of negative problems such as limb confusion, color mixing, grayscale abnormality, etc., trigger words .
  • Deep Negative V1.x : trained for real-life models . Troubleshoot issues including incorrect human anatomy, offensive color schemes, upside-down spatial structures, and more. trigger word NG_DeepNegative_V1_75T.
  • CharTurnerV2 : Basic sentence structure A character turnaround of a(X)wearing(Y). The closer the tag is to the front, the higher the weight, and it can also be added Multiple views of the same character in the same outfitto achieve the effect of multiple perspectives of the same character clothing. The effect of using it alone is not good, and it works better with the character turning lora— CharTurnerBeta - Lora .

  In Stable Diffusion, embedding technology can be understood as a component that converts input data into a vector representation for easy model processing and generation. If checkpoint is a thick dictionary that can be generated by querying many entries (keywords), then embeddings are like an efficient index that can point to specific content; and LORA is like a coloring page in the dictionary, Pointing to specific content is more specific (contains more information).

  For example, if we want to generate a happy Pikachu, we usually need to input a lot of descriptors, such as yellow hair, mouse, long ears, blush and so on. However, if the embedding of Pikachu is introduced, we only need to enter two words: Pikachu and happy. Pikachu's embedding packs all Pikachu's feature descriptions, so that we don't have to input a lot of words each time to control the generated picture.

  In daily use, embedding technology is usually used to control the actions and characteristics of characters, or to generate a specific style of painting . Compared with other models (such as LORA), the size of embedding is only tens of KB, not hundreds of megabytes or several gigabytes. Although the reduction degree is worse than that of lora, it is more convenient in storage and use.

  In short, using embedding, we can more easily generate expected samples without manually entering a large number of descriptive vocabulary. Here are some commonly used embeddings.

4.3.1 Specific character image (including tag inversion)

For example, Corneo's D.va   on the liblibai website trains the popular character D.va in Overwatch, downloads it and stores it in stable-diffusion-webui/embeddingsa folder, and then activates it with specific prompt words in the prompt.

  In the medel card of this embedding, the author said that the activation word is corneo_dva, and the recommendation weight is recommended to be 0.9 to 0.95, so we can write (corneo_dva:0.95). In addition, the character description is added to the prompt words, and the generation will be more accurate. So we upload a display image of the author, reverse the tag first, and then fill in the prompt.

  In the image-generating graph, there are two algorithms for reversing the tag of a photo: CLIP and DB. The latter is faster and more accurate. After identification, a screening is performed, and only accurate descriptors are retained.
insert image description here
  If manual filtering is too troublesome, you can open the tigger tab and use tiggerthe plug-in to reverse push. After the picture is uploaded, the hint words that are pushed back will show its confidence level, and one of them sensitiveindicates the safety score. We can manually set the confidence threshold to 0.8, click reverse derivation again, and only keep the prompt words with confidence >0.8.
insert image description here

4.3.2 Character reversal

  Some time ago, there were many examples of very delicate 3D characters on the Internet, which were CharTurnerrealized through this embedding. This is actually obtained by training several pictures of people with different orientations side by side. The author gives the basic enabling sentence in the model cord: A character turnaround of a(X)wearing(Y). The closer the tag is to the front, the higher the weight, and it can also be added Multiple views of the same character in the same outfitto achieve the effect of multiple perspectives of the same character clothing. Multiple embeddings can be enabled in the prompt, and the effect needs to be grasped by yourself.
insert image description here

insert image description here
  The CharTurner embedding still has many shortcomings. The inability to accurately control the character details and turning movements can be partially resolved in the follow-up Lora. For example, CharTurnerBeta - Lora 's turning effect is better, and it can also be used in conjunction with the embedding.

4.3.3 Solving bad cases

  Until now, the pictures generated by Stable Diffusion are still easy to draw wrong hands and feet, or even have multiple hands and feet. Several high-ranking embeddings on station C can solve this problem. These embeddings record a series of AI wrong drawing methods, and after integration, put them into the negative prompt box for activation, which avoids the above-mentioned wrong drawing to a certain extent.

  • badhandv4 : trigger wordbadhandv4
  • EasyNegativeV2 : For the training of the two-dimensional modeleasynegative , it solves a series of negative problems such as limb confusion, color mixing, grayscale abnormality, etc., trigger words .
  • Deep Negative V1.x : trained for real-life models . Troubleshoot issues including incorrect human anatomy, offensive color schemes, upside-down spatial structures, and more. trigger word NG_DeepNegative_V1_75T.

4.4 LoRa

Here are a few of my favorite loras:

4.4.1 LoRaIntroduction

  LORA is similar in nature to embedding. Because it carries a large amount of training data, LORA reproduces characters and detailed features more delicately.

  LORA activation method: <lora:lora_filename:weight>, where lora_filename is the file name of the LORA file you want to activate (without the suffix), for example, you <lora:CharTurnBetaLora:0.3>can enable the character turning LORA CharTurnerBeta . Some LORAs also have trigger words, indicating that the author has conducted intensive training based on this tag, and enabling both at the same time can enhance the effect.

insert image description here

insert image description here

   The bottom model and trigger vocabulary corresponding to the LORA model can be obtained from the model information given by the LORA author.

  It should be noted that the image source of LORA training is complex, so it will generally have a slight impact on the style of painting, which can be suppressed by reducing its weight. The larger the weight setting, the smaller the influence factor on the painting style. Normally, the weight should be controlled between 0.3-1.

  In order to get the best effect, we can choose appropriate prompt words and exclusion words according to different LORA models, and adjust them when setting weights. At the same time, we can also refer to the experience and skills of other authors in order to better use LORA to generate images.

4.4.2 Extended Model Loading

We click the small red button below the Vincent diagram to display the extended model options.
insert image description here

insert image description here
  These models do not have a preview by default, you can run a picture after loading, and then click to replace the preview with the currently generated picture. After configuration, a picture with the same name will be generated next to the model file, and you can delete it if you don’t want it. You can also choose a picture and change it to the same name as the model, and it will automatically become the cover picture of the model after refreshing.

  In Settings - Extended Model, some details can be set. For example, set the display mode of the model as a card or a thumbnail, the width and height of the card, the loading weight of the LORA model, and so on.
insert image description here

4.4.3 Addition network loading

github address: sd-webui-additional-networks

  The LORA loaded in the previous ways will be clearly written in the prompt, so that this part of the content is visible when sharing pictures. There is also a loading method, which is used when multiple LORAs are used at the same time. Click the addition network at the bottom of the page to enable up to five LORAs and configure the weights separately.
insert image description here
insert image description here
  Through the addition network, the default read is extensions-addition-network-models-lorathe LORA file under the folder. We can set its reading path as the default installation path of lora settings-addition networkin the first line.
insert image description here
insert image description here
  The way the addition network loads LORA is completely independent from the prompt words, so the readability of the prompt words is improved. The disadvantage is that the LORA information loaded in this way will not be displayed when sharing. Therefore, some satisfactory pictures on the Internet cannot be perfectly restored after copying the parameter information. It may be that the author used this method to load the LORA file.

  The addition network also expands the masking function for LORA so that it can act on a specific area of ​​the picture, and it will be added later.

4.4.4 Practical application of LORA

  The specific application of LORA can be divided into the following five types:

insert image description here

  1. Character lora: Recommended weight 0.7-0.8.
      Take lucy, the LORA, as an example. We first use the cover image to perform tagger inversion, and then directly draw Vincent’s image. Even if the description is so detailed, AI cannot restore an accurate image of Lucy. And joining Lucy, the LORA, immediately felt like a lot. Because this LORA is trained with many lucy character maps, the transmission of information is more accurate.
      For example, there are white jackets in the prompt word we deduced, but there are countless white jackets in the world, and AI does not know which one to draw. After providing LORA, AI can extract the key information and perform accurate drawing.
    insert image description here
      Using the real large model + animation character LORA, you can get a real coser image. Combined with the controlnet that will be discussed later, you can also design the pose and composition of the character, and customize your own works.

insert image description here

  1. Style lora (art-style): the weight is 0.2-0.3, too high will make the character Lora lose some of its original features. Taking the popular
      fashion girl at Station C as an example, the author uses 100 photos of fashion girls that meet his aesthetics for training. , can make the generated female characters more aesthetically pleasing, and the trigger words are fashi-girl, red lips, mature female, makeup. Similar to FilmGirl film style , Hua Xiangrong/Chinese style, etc., you can realize your favorite style.

  2. Painting style lora: Recommended weight 0.2-0.4,
      such as Ghibli Style LoRA , superimposed use of trigger words ghibli style, can realize the painting style of Studio Ghibli (Hayao Miyazaki), this style can be summarized as picture book-like character design, moisture texture Rich colors, exquisite and detailed background scenes.

  3. concept (concept) lora
    takes Gacha splash LORA as an example. This is trained by using the exquisite vertical drawing when drawing cards in the card drawing game. After using this LORA, the generated pictures will also have the style of drawing cards. This conceptual type of LORA has higher requirements for the writing of prompt words. Before using it, you should familiarize yourself with the model card and refer to the author's example image information. Similar concept LORA also includes tarot cards, mugshot lora (archive photo) , Guofeng future and so on.

Image 1 Image 2
  1. Clothing lora: If the weight is too high, it is easy to lose the human body, because this kind of lora is trained based on clothes.
    For example, if you want to create mecha-style works, you can find many mecha-style lora by searching mecha, such as mechanical storm lora , laser suit holographic clothing , Hanfu Tang style lora and so on.
Image 1 Image 2 Image 2

  Finally, when you want to strengthen a certain aspect of your work, you can superimpose and use multiple loras of the same type, for example, use a variety of mecha loras to generate mecha-style pictures. In specific use, you can control the weight of different lora to make the work more like a certain kind of lora.

insert image description here
6. object: a specific element, which can realize product design, product blueprint, etc.

insert image description here

4.4.5 Partial redrawing + lora

  Advanced usage is to use partial redrawing to introduce into the picture. For example, adding a helmet to a picture of a girl with a sense of technology, we can partially redraw the generated picture. We painted the head part, and painted a part outside to give the AI ​​a little room for creation, and then selected the helmet lora for partial redrawing. There are two ways to redraw:

  • Full image redraw: At this time, the entire image can be redrawn, most of the prompt words and parameters remain unchanged, only the helmet lora related prompt words and trigger words are added)
  • Mask redrawing: Remove the previous content words, and only keep the prompt words and trigger words related to lora, so that the redrawing result will be more accurate.
Image 1 Image 2 Image 2

  What are the benefits of doing this? Because the helmet lora only involves a small part of the picture, if it is forced to apply to the whole picture, there is a certain probability that it will interfere with the large model to generate an excellent picture. At this time, partial redrawing becomes an excellent independent solution. This method of drawing a large picture without adding lora, and drawing a part with lora can be used in other lora applications such as clothing and products.

4.5 hypernetwork

  hypernetworkThe effect is LORAalmost the same. The difference is that hypernetworkit is generally used to improve the overall painting style, which is more delicate than the painting style between different checkpoints. For example, the difference is not as big as the real model and the two-dimensional model, but similar to Van Gogh. A small difference between Monet and Monet.

  Take WavenChibiStyle as an example, it is a cute painting in Q version. The way to use it is to click the Settings tab, select the hypernetwork model in the drop-down menu of the attached hypernetwork on the left , and then save the settings. We keep the character settings unchanged, delete all scene words, use a solid color background, and generate a picture with a square resolution.

insert image description here

insert image description here
   At present, hypernetworkthe role of can be partially LORAreplaced, and its generation effect is not LORAso good, but it hypernetworkis still an option when a specific style of picture needs to be generated.

  Finally, after installing tag completethe plug-in that completes the tag, <e,<l,<hyou can automatically identify and load the installed embedding, lora, hypernetwork, etc. after typing, which is very convenient.

5. HD Restoration & Enlargement Algorithm

insert image description here

5.1 Hi-Res Fix

  HD Restoration is a built-in function of Vincent Chart. The following uses LOFIan example diagram from the model author for illustration. We copy the parameters of the sample image, add some prompts, and set the initial resolution to 500×750. You can see that the generated image has insufficient details and is still a bit blurry (the resolution is too low, and the model does not have enough room for operation).

insert image description here
  At this time, if you directly expand the resolution and generate high-resolution images at one time, it is very easy to burst the video memory, and it is easy to generate images with multiple heads and multiple hands. At this point you can consider using HD repair:
insert image description here

  • Size setting : You can set how many times the original image is enlarged, and you can also set the specific resolution value of the enlarged image
  • High-definition restoration sampling times : High-definition restoration needs to be redrawn once, so the number of sampling steps is required. steps defaults to 0, which is equivalent to the default sampling steps of 20 in the Vinsen diagram.
  • Redrawing range : It is equivalent to the redrawing range in Vincent's diagram. To keep the image roughly unchanged, it generally does not exceed 0.5
  • Redrawing method: recommendation R-ESRGAN 4×+, two-dimensional selection R-ESRGAN 4×+ Anime6B, and the model author will also recommend high-definition restoration algorithms.

After high-definition restoration is set, a low-resolution image will be generated first, and noise will be added to the latent space for redrawing and enlargement. The following is the effect comparison:

insert image description here

  • Advantages of HD Repair:
    • After fixing the random seed, the screen structure will not be changed
    • Stably overcome problems such as multi-person and multi-head
    • Simple operation and direct effect
  • HD Repair Cons:
    • The generation speed is slow (drawing one picture is equal to drawing two or three times), and cannot break through the memory limit. It is recommended to draw the card multiple times, and then fix the random seed HD repair after the card is released
    • There will be inexplicable extra elements, which can be suppressed by reducing the redraw rate. If the redraw rate is too low, the edges will be blurred, if it is too high, it will be deformed, and if it is 0.9, it will be long. Simply for zooming in, recommended 0.3-0.5.

insert image description here

5.2 SD Upscale script (block enlargement)

  Tushengtu does not have the option of HD restoration. But in fact, choosing a low-resolution photo and setting high-resolution parameters for generation is itself a kind of high-definition restoration. In the "Zoom in" column of the Settings tab, you can set the zoom-in algorithm of the generated image.

  If you want to generate higher-resolution images, you can use Upscalethe zoom-in script. We load an image from the gallery browser to the graph, and the generation parameters will be automatically copied synchronously. Set the redrawing range to 0.5, and select SD Upscale in the script drop-down menu at the bottom. Select 2 for the magnification, select the default value of 64 for overlapping pixels, select the redrawing algorithm R-ESRGAN 4×+ Anime6B, change the original image size from 600×600 to 664×664:, click Generate, and you will get a 1200×1200 image.

Final image size = (original image size - overlapping pixels) × magnification

insert image description here

  SD Upscale is to divide the image into four parts evenly, enlarge them one by one and put them together again. If you do not set overlapping pixels, the segmented part is not excessive, it will be very stiff, and a boundary line will appear.

insert image description here

  • Advantages of SD Upscale:
    • It can break through the limitation to obtain a larger magnification (up to 4 times).
    • High precision, better detail control
  • SD Upscale Disadvantages:
    • Splitting and redrawing process is uncontrollable
      • If you zoom in too much, it will be difficult for the prompt words to match each area, which will easily cause confusion in the picture.
      • If the key parts of the face and body are on the dividing line, it is easy to produce a discordant picture. Can be suppressed by reducing the redraw rate
    • Extra elements will appear inexplicably

5.3 Additional Function Amplification

  In the additional function tab, you can zoom in on the AI ​​algorithm. Open a picture, set the magnification and algorithm (you can choose two), and click Generate to get a 2 times enlarged picture. This kind of enlargement is the same as the principle of high-definition restoration of old films in AI, and the redrawing degree is 0.

  The additional function magnification is just a simple magnification, so the speed is very fast, and the calculation is not burdensome, but the degree of refinement is not as good as the first two, so it can be considered to be used in combination.

5.4 Ultimate upscale

See section 7.5 of this article.

6. Partial redrawing

6.1 Simple partial redrawing

  Sometimes we finally generate a satisfactory picture, but there are a few details that need to be adjusted. For example, if the clothes are changed to sleeveless, if you directly change the prompt words at this time, the whole picture will be changed. At this time, you can use partial redrawing, which can be understood as using the redrawing area to make a separate picture according to the changed prompt words. generate the graph while keeping the rest unchanged.

  Partial redrawing can also be used for AI to change faces, change clothes, change backgrounds and remove graffiti, etc.

  1. Click "Picture", select "Layout Redraw"
  2. Select an image to redraw
  3. Click the small brush next to the image, and use the brush to blacken the place you want to modify. The blue point on the right side of the picture can be dragged to change the size of the brush. Use a small brush for the edge, and a thick brush for the middle area;
  4. Write the words you want to regenerate in the positive keywords, such as off shoulder, because you only want to show your shoulders, so you don’t need to add other main prompt words, and then click generate.

insert image description here

The following is an analysis of several redrawing parameters:

  • Mask blur : Its effect is similar to the feathering effect in Photoshop. It can blur and soften the edges of the splicing of the masked area and the non-masked area, so that the redrawn part fits more closely with the original image. Generally, it is controlled below 10. Specifically, it is adjusted according to the size of the redrawing area. The larger the area, the higher the mask blur value.
  • Mask mode : A mask can be simply understood as an area painted with a brush. The default is to redraw the masked area, and you can also choose to redraw the non-masked area in reverse;
  • Mask content : The latter two methods of image generation are more complicated, and theoretically the redrawing effect is more significant
    • Original image: default item, no additional processing
    • Filling: first blur the content of the mask, and then remove the noise step by step to generate a new picture. At this time, the degree of freedom of AI will be higher.
    • Latent variable noise: first change the mask part to noise, and then regenerate the image
    • The value of the latent variable is zero: it is equivalent to the filling mode, and it is also the process of denoising after blurring the mask color

insert image description here
insert image description here

  • redraw area :

    • Full image: AI redraws the entire image according to the changed prompt words, but only redraws the masked area back to the original image, because the entire image is seen, and the general effect is better.
    • Mask only: AI redraws the masked area as a whole picture, and then puts it back together. Faster because the redraw area is smaller. But because you can't see the whole picture, there will be strange situations when you often put it back together.

    In general, it is recommended to select the full image to redraw. Only the mask is suitable for highly targeted situations, such as redrawing only the hands. At this time, it is hoped to reduce the redrawing range to avoid deformation, and to purify the prompt words (5 fingers, hand open, high five). In the mask mode only, there is a parameter of reserved pixels on the right, which is the same as the effect of the overlapping pixels of the tiles in SD Upscale in section 5.2. It is used for smooth transition at the interface when stitching back together, and the corresponding value is made according to the size of the mask. adjust.
    insert image description here

  • Denoising : Indicates the similarity between the generated image and the original image. The range is 0 to 1, and the default is 0.75. The larger the value, the greater the difference between the generated image and the original image. You can adjust it according to your needs.

6.2 Partial redrawing of hand painting mask (inpaint sketch)

  Only partial redrawing, the effect is not strong enough. For example, if you want to redraw the hand, it is still easy for AI to make mistakes. At this time, you can choose a hand-painted mask to paint on the mask.

  In the following example, open a picture under the hand painting mask, and there is a palette button in the upper right corner, and you can choose the brush color after clicking it. We use a blue brush to draw a mask, and then use a white brush to draw a love heart on the mask, then add the corresponding prompt word (blue face mask with heart sign: 1.2), set the redrawing range to 0.8, and finally click Generate, the effect is as follows :

Image 2 Image 1

  The mouse is still not sensitive enough for drawing. If an external CNC drawing board is connected, the effect will be better.

  Here's how to draw hands. Let's upload a picture, click the color picker straw next to the color palette, first absorb the background color of the wall to smear the hand, then absorb the flesh color of the face to draw the hand shape, and fill in the prompt words (5 fingers,detailed hand,high five:1.2), the negative prompt words use the embedding of EasyNegativeV2 mentioned earlier . When drawing hands, it is recommended to set the redrawing range to 0.5. If it is too large, the outline of the hand will be blurred. In addition, the mask blur should not be too large, just use the default value of 4. Here is the resulting effect:

  The hand-painted mask has an additional parameter of mask transparency than before . If the color saturation you choose to hand-paint is too heavy, you can properly turn it on to weaken the color and keep it consistent with the color of the task painting style.

Image 1 Image 1 Image 2

6.3 Drawing (sketch)

  Drawings can also be used on their own, such as adding a tie to the figure in the picture. The difference is that if partial redrawing is not enabled, the entire picture will have some changes.
insert image description here
or it could be:

Image 2 Image 1

6.4 Upload mask for partial redrawing (inpaint upload)

  Upload masks are used to precisely control the extent of the mask. The essence is to cut out the part that needs to be redrawn in the PS software, and then upload this template to the partial redrawing, and then the partial redrawing can be performed.

insert image description here
  Open PS, if there is only one main character in the picture, click "Select" - "Subject", the character will be automatically selected, and a selection area represented by a circle of ants will be generated. If there are many characters, it is recommended to use the object selection tool in the toolbar to select a character area, and the corresponding character will be automatically selected. If there are excess or missing parts, use the lasso tool. The selected part circled by the lasso will be added to the selection area, and the selected area will be deleted by holding down alt to select it.

Image 2 Image 1

Watch the video for the rest, "Teach you how to play with partial redrawing step by step" starts at the 12th minute.

Seven, plug-ins

  The initial webui interface is very simple. All plug-ins are in the extensions column, and there are three check boxes: installed, installable, and installed through links.
insert image description here

7.1 Plug-in installation method

  1. Automatically installed.
    Open the installable checkbox and click load form to see all installable plugins. Search for the plug-in to be installed in the search box, and click install on the right to install it automatically.
    insert image description here

insert image description here

If this url is accidentally cleared, here are two addresses:

  • Extension reference address:
    https://raw.githubusercontent.com/AUTOMATIC1111/stable-diffusion-webui-extensions/master/index.json
  • Backup mirror address:
    https://gitee.com/akegarasu/sd-webui-extensions/raw/master/index.json
    https://gitgud.io/AUTOMATIC1111/stable-diffusion-webui/-/wikis/Extensions-index .md​
  1. Install according to url Open
    the install from url checkbox, copy the github URL of the plug-in, and paste it into the url column.
    insert image description here

insert image description here

  1. Download and install
    The first two methods may cause installation errors when the network is not good. The more thorough method is to directly download the plug-in installation package for installation. For example, use the code to download the tag automatic translation plug-in in the extensions folder git clone https://github.com/Physton/sd-webui-prompt-all-in-one.git(you can use download to download manually). After installation, click Restart webui to refresh, and the plug-in will appear in the installed list if it is installed correctly.
    insert image description here
    insert image description here
      If a certain plug-in cannot be used, click the check update button on the right, and usually update to the latest version to solve the problem. If there is still a problem with a certain plug-in, you can click the check box in front to cancel it temporarily. Let's start to introduce 8 plug-ins:
    insert image description here

  1. Chinese Language Pack : Search zh(uncheck the localization/Localization filter)

  2. Gallery Browser : Searchimage browser

  3. Prompt word completion : search tag complete, Chinese thesaurus see online disk

  4. Prompt reverse push : searchtagger

  5. Ultimate Upscale script: SD upscale upgrade version, used for image enlargement. searchultimate upscale

  6. Local Latent Couple : local detail redrawing, searchllul

  7. Cutoff : Accurate color control, prevent cross-color, searchcut off

  8. prompt-all-in-one : a comprehensive prompt plug-in with many functions, see the video "Prompt word completion plug-in (automatic translation) plug-in" for details .

@Nenly

7.2 Sinicization package

  Uncheck localization in the installable box, and then search for it zh. It is recommended to install Qiuye's installation package Cloud Chinese Translation (the first one). After installation, select the switch language in settings-user interface zh-Hans, and then restart the webui.
insert image description here
insert image description here

7.3 Gallery browser

  By default, webui can only view the previously generated picture information in the output folder. After installation, image browseryou can view the picture generation information in the gallery browser, send pictures to Wenshengtu with one click, sort pictures, score and filter, Deleted pictures and favorite pictures.

  • filename keyword search: Search for keywords on the file name of the picture. The file name of the picture generally includes the path, file number and a short keyword prompt;
  • EXIF keyword search: Perform keyword search on all generated information of the image. For example, if you search for smile, all pictures containing the hint word of smile will be displayed.
    insert image description here
  • Favorites : For partially satisfied pictures, click Add to Favorites below, and these pictures will be placed in a separate directory
    -
    insert image description here
  • Scoring : For some obviously unnecessary pictures, you can score one point for all, and others are scored according to satisfaction. For example, in the picture below, score in the red area, filter out pictures with only one point in the blue area, then click on the first one-point picture, enter N in the yellow area, and you can delete this and subsequent N pictures.
    insert image description here

7.4 Tag plugin

7.4.1 tag complete

   tag complete: It is used to automatically complete the prompt words that have not been entered, and will also prompt you with prompt words that are more in line with AI logic. For example, take off the hat below. Compared with hat_off, hat_removed will express more accurately in AI image generation.

  Tag completion is implemented based on local booru, and the completion content also includes popular animation IP and so on. In addition <e,<l,<h, it can automatically identify and load the installed embedding, lora, hypernetwork, etc., which is very convenient.

insert image description here


  Chinese tag translation: operate according to the video "The Most Complete Tag Thesaurus on the Internet" at Station B.

  • Click here to download a1111-sd-webui-tagcomplete.zipthe compressed package
  • Put the compressed package into the extensions folder and decompress it (you need to delete the previous tag completion plugin first)
  • In the settings - plug-in completion, the first line filenameis set to zh_cn.csv; then scroll down and translate filenameselect zh_cn_tr.csv, so that the configuration is complete, save the settings.
    insert image description here

insert image description here
After configuration, you can directly input Chinese in the prompt word box.
insert image description here
  In addition, input <e,<l,<h, tag completeit can also automatically identify and load the installed embedding, lora, hypernetwork, etc., which is very convenient.

7.4.2 tagger

  taggerIt is used to invert the prompt word. The lower right is the confidence level of the reversed prompt word. We can set the confidence threshold to filter the reversed prompt word. After setting, click the reverse push button again.

  The lower left of the tagger can also add and discharge prompt words, and the upper right sensitive can also display the sensitivity of the image.

insert image description here

  In addition, station B also has Chinese-English bilingual translation and another tag completion plug-in local Chinese thesaurus file, you can refer to the video "Chinese-English bilingual, TAG prompt word automatic completion"

7.4.3 prompt-all-in-one

Reference video "Prompt word completion plug-in (automatic translation) plug-in"

  The previous tag completion plug-in essentially performed comparative translation according to the local thesaurus (two CSV files), and words that were not in the thesaurus could not be translated; this plug-in performs online translation through a third-party interface. We enter Chinese in the box on the right, and press enter to automatically add the English tag to the prompt word box. The functions of the plug-in buttons are introduced one by one as follows:
insert image description here

  • The first button: set language;

  • The second button: set;

    • Among them, the first api button, you can set the translation interface after clicking it, the upper part is free, and the lower part has paid chatgpt and other interfaces. You can enter a paragraph of English and click Test to see the effect. At the bottom of the note, you can use local files for accurate translation.
      insert image description here
      insert image description here
    • A|BButton: Automatically translate the English tag of the prompt word box into Chinese
    • EnButton: After entering the Chinese prompt word, it will be automatically translated into English
    • TButton: When checked, an explanatory prompt will be displayed below the plugin
  • The third button: History, you can see the history of prompt words after clicking on it, click the favorite button on the right, you can add it to your favorites, click the button on the far right, you can automatically input it into the prompt word box
    insert image description here

  • The fourth button: prompt word favorite list

  • A|BButton: Translate the English prompt words in the prompt word box into Chinese, no need to use other software for Chinese-English translation.
    insert image description here

  • EnButton: One click will translate all Chinese in the prompt word box into English

  • The last two buttons are copy prompt word and delete prompt word

  Generally, we can click translate after entering Chinese in the prompt word box. If you want to add prompt words later, you can continue to input Chinese in the box on the right, and press Enter to automatically translate it into English and add it.
insert image description here

  In addition, move the mouse to the prompt word below, you can also quickly drag the prompt word and quickly adjust the weight of the prompt word (click the ± sign or bracket), which is more convenient.

insert image description here

7.5 utimate upscale

   utimate upscaleIt is a better image enlargement script, which can be regarded as a fully upgraded version of SD upscale. The flattening is more natural and the redrawing effect is better. After installation, it can be seen in the script drop-down menu. Several parameters are introduced below.

  1. Enlarged size
    In SD upscale, it can only be enlarged according to the size of the original image, but after selection here custom size, the enlarged size can be customized. Other enlargement algorithms, mask edge blurring (Section 6.1), and reserved pixels (Section 5.2) have all been mentioned before.

insert image description here

  1. The size of the split area
      utimate upscaleis to split the picture into several pieces and zoom in separately. The size of the split area can be customized. As shown in the figure below, set the width and height of one element to 0, which means it is a square. So utimate upscaleby default, it is also divided into 512×512 areas for zooming in.

insert image description here

  1. Drawing order: The blue box above can select the drawing order. Linear is to draw one by one in order, and chess is like a chess board, with black and white cross-drawing, and it is said that the stitching effect is better.
  2. seams fix: Indicates the joining method at the junction, and the third option is better.
    insert image description here
  3. Denoise: The redrawing range is more stable utimate upscalethan the ratio, and the repetition range can be set a little higher, such as about 0.5.SD upscale

7.6 Local Latent Couple

  Local Latent CoupleUsed to enrich local picture details. After installation, select it in the script drop-down box LLuL, and a box interface will appear. The shaded box indicates the area to be enlarged in detail, and the default is a quarter of the entire image. Drag the box to select the area we want to zoom in. Other parameters, especially the random seed, remain the same. When regenerated, the details of the clothes will be more in the new picture.
insert image description here

Image 1 Image 2 Image 2

7.7 cutoff Precise color control

  cutoffIt can solve the problem that the prompt words interfere with each other. For example, input a cute girl, white shirt with green tie, red shoes, blue hair, yellow eyes, pink skirt, the resulting picture may have the problem of cross-coloring of various parts. After the cutoff is turned on, it can effectively suppress:
insert image description here

  Implementation principle: each color word is processed separately. When a color word generates a word vector, other color words are cut out, and the obtained result is used as the vector of the color word, that is, the method of increasing the amount of calculation is adopted. Originally, a text prompt only needs to go through the text encoder once, but now it needs to go through the text encoder n times to get the final text representation (n means that there are n colors in the prompt)

After enabling cutoff, enter the above color words and the weight of cutoff, as shown in the figure below:
insert image description here
Other parameters:

  • Enabled : Check to enable
  • Target tokens(comma separated) : The colors that need to be separated, which can be color words, or a comma-separated tag list white shirt with green tie, red shoes, blue hair, yellow eyes, pink skirt.
  • Weight : Strength of Cutoff
  • Details: detailed settings
    • Disable for Negative prompt. : Disable this plugin for negative prompts
    • Cutoff strongly. : A strong prompt word, only one can be placed, and the segmentation effect is the greatest.
    • Padding token (ID or single token) : padding token (ID or single token), usually left blank.
    • Interpolation method : Interpolation method, Lerp/SLerp (default is Lerp)

Guess you like

Origin blog.csdn.net/qq_56591814/article/details/131478164