Stable Diffusion Tutorial (5) - Vincent Diagram Tutorial

Supporting video tutorial: https://v.douyin.com/UyHNfYG/ 

The interface of Vincent diagram is marked as follows

1 Prompts and counter-cues

The thing entered in the prompt word is what you want to draw, and the input in the reverse prompt word is the thing you don’t want to draw

Only English can be entered in the prompt box, all symbols must use English half-width, and half-width commas should be used to separate words

1.1 General principles

Generally speaking, the higher the vocabulary weight will be, for example

  • car,1girl, there may be a whole car with a girl standing next to it
  • 1girl,car, a girl portrait may appear with half a car in the background

So in most cases the prompt word format is

  • Quality words, medium words, subject, subject description, background, background description, artistic style and author

An example would be

  • masterpiece, bestquality, sketch, 1girl, stand, black jacket, wall backgoround, full of poster, by token,
  • A high quality sketch of a token drawing of a girl in a black jacket standing in front of a wall covered in posters

But in fact, the text encoder used by SD will respond to all texts, and its sensitivity to different words is completely different, and it will also have different sensitivities to different lexical expressions of the same meaning. There are no certain rules, so it is still It is necessary to go through repeated debugging in person to understand the sensitivity of SD to various lexical arrangements and combinations, and to form a rough intuition

1.2 Weight adjustment

The most direct weight adjustment is to adjust the order of words, the higher the front, the greater the weight, and the lower the weight

The following syntax can be used to set the weight of keywords. Generally, the weight is set between 0.5 and 2. You can quickly adjust the weight by selecting a word and pressing ctrl+↑↓, each step is 0.1

  • (best quality:1.3)

The following method is also a common weight adjustment method on the Internet, but it is not convenient to debug so it is not recommended

  • (best quality) = (best quality:1.1)
  • ((best quality)) = (best quality:1.21) ,即(1.1 * 1.1)
  • [best quality] = (best quality:0.91)

1.3 Starting moves

Now my suggestion is to use as concise a starting gesture as possible, rather than the particularly lengthy starting gesture in the early days, because the more prompt words you enter, the longer the AI ​​drawing time will be, and at the same time, the more attention will be allocated to each word. low, it will be difficult to debug

Compared with the earlier model, the current model has made great progress in vocabulary sensitivity, so don't worry about the prompt words being too short and causing poor picture effects

Simple heads and tails starting moves

  • masterpiece, best quality, 1boy
  • nsfw, (worst quality, bad quality:1.3)

Slightly longer front and back hands

  • masterpiece, best quality, highres, highly detailed, 1girl,
  • nsfw, bad anatomy, long neck, (worst quality, bad quality, normal quality:1.3), lowres

1.4 Term combination

Combining several words in parentheses will not make AI treat them as one, even if they are weighted, for example, the following two are actually completely equivalent

  • (car, rock, gun:1.3)
  • (car:1.3), (rocket:1.3), (gun:1.3)

The combination of entries is similar to natural language, using prepositions, such as and, with, of, etc., such as

  • (car with guns and rockets)

2 Sampling method

There are many sampling methods, but there are basically only a few commonly used

2.1 Euler a

The fastest sampling method has very low requirements on the number of sampling steps. At the same time, as the number of sampling steps increases, it will not increase the details. When the number of sampling steps increases to a certain number of steps, the composition will change suddenly, so don't use it in a high-step situation. use

2.2 DPM++2S a Karras 和 DPM++ SDE Karras

These two are not too different. It seems that SDE is better. In short, the main feature is that compared with Euler a, the details will be more at the same resolution.

2.3 NO

It is rarely used, but it can be used if you want to try a super high number of steps, and the details can be superimposed as the number of steps increases

3 Sampling steps

Generally speaking, the sampling deployment only needs to be kept between 20 and 30 most of the time. A lower sampling deployment may cause the image to be incompletely calculated, and the detail benefit of a higher sampling step is not high, only very weak The evidence shows that high steps can repair limb errors with a small probability, so only use higher steps when you want to produce a map with exhaustive details

4 Generate batches and generate quantities

  • The generation batch is a total of several batches of pictures generated by the graphics card
  • The number of generation is how many pictures are generated by the graphics card in each batch

That is to say, every time you click the generate button, the number of generated pictures = batch * quantity

It should be noted that the number of generation is the number of pictures generated by the graphics card at one time, and the speed is a little faster than increasing the batch, but if the adjustment is too high, it may cause insufficient video memory and cause generation failure, but generating batches will not cause insufficient video memory, as long as Enough time will continue to generate until all output is complete

5 Output resolution (width and height)

Image resolution is very important, it directly determines the composition of your image content and the quality of details

5.1 Output size

The output size determines the amount of information in the picture content. Many details such as the face, accessories, and complex patterns in the whole body composition can only have enough space to express on a large picture. If the picture is too small, such as a face, it will only shrink In a ball, there is no way to fully express

But the bigger the picture, the more AI tends to stuff more things into it. Most of the models are trained at 512*512 resolution, and a few are trained at 768*768, so when the output size is relatively large, for example When it is 1024*1024, ai will try to insert the content of two to three pictures in the picture, so there will be various splicing of limbs, many people not controlled by the entry, and multiple angles. Adding entries can Partial relief, but the most important thing is to control the frame size, first calculate the small and medium images, and then enlarge them to large images

Approximate output size and content relationship reference:

  • About 30w pixels, such as 512*512, mainly for headshots and busts
  • About 60w pixels, such as 768*768, mainly for a single person, standing or lying down
  • More than 100w pixels, such as 1024*1024, single or two or three people full body, mainly standing
  • Higher resolution, group portrait, or direct screen crash

5.2 Aspect Ratio

The ratio of width to height will directly determine the content of the screen, the same is the example of 1girl:

  • Square image 512*512, will tend to show faces and busts
  • High image 512*768, will tend to stand and sit full body portrait
  • Wide image 768*512, it will tend to be a half-recumbent image with oblique composition

So adjust the output ratio according to what you want

6 Cue Word Correlation (CFG)

It is difficult for CFG to describe the specific function in words. Generally speaking, it is to add a coefficient to all your positive and negative prompt words, so generally the lower the CFG, the more plain the picture and relatively few details. CFG The higher the value, the more greasy the picture and relatively more details

  • The two-dimensional style CFG can be adjusted higher to obtain richer color and texture expression. Generally, it is 7~12, and you can also try 12~20
  • The CFG of the realistic style is mostly very low, generally between 4 and 7. The realistic model is very sensitive to CFG. If you adjust it a little more, the ancient gods may come. You can fine-tune it in steps of 0.5

7 random seeds

The random seed can lock the initial latent space state of this picture, which means that if other parameters remain unchanged, the picture generated by the same random seed should be exactly the same. You can observe the influence of various parameters on the picture by locking the random seed, and also Can be used to reproduce the picture results of yourself and others

  • Click the sieve button to set the random seed to -1, which is random
  • Click the Recycle button to set the random seed as the random seed of the picture you are looking at in the picture column on the right

It should be noted that even if all the parameters including the random seed are the same, there is no guarantee that the picture you generate will be exactly the same as others. With the change of graphics card driver, graphics card model, webui version and other factors, the output picture with the same parameters Results are subject to change, ranging from minor detail differences to radical compositional changes

8 Facial Restoration

Facial inpainting has certain value when the resolution of realistic pictures generated by early models is not high, and it can correct wrong realistic faces at low resolutions, but the face accuracy of current models has far exceeded that of early models, and facial inpainting The function will change the appearance of the face, so just ignore this function

9 other

9.1 VAE settings

The role of VAE is to correct the color of the final output image. If VAE is not loaded, the image may appear particularly gray. Set the location:

  • Set-StabelDiffusion-Model VAE

After setting, remember to click the save settings above, VAE is universal and can be combined with any model

The integration package already comes with final-pruned.vae.pt, which is generally used to correct the two-dimensional model, but this VAE may prompt an error after the image calculation is completed:

  • modules.devices.NansException: A tensor with all NaNs was produced in VAE. This could be because there’s no enough precision to represent this picture. Try adding –no-half-vae commandline to fix this.

If this happens, you need to fill in --no-half-vae in the extra parameter column of the launcher to solve it

In addition to this VAE, there are other VAEs available. If you think the color is gray, you can switch to use:

VAE placement path: *\models\VAE

If you use these two VAEs and find that the lines calculated by the two-dimensional image are very thick and there are red and purple borders, switch back to final-pruned.vae.pt to solve the problem

9.2 Image information

The pictures generated by each SD will automatically write relevant parameter information, including positive and negative prompt words, sampling steps, sampler, CFG, random seed, size, model hash, model name, clip skip, super-resolution parameters, etc.

You can read the parameter information by dragging someone else’s original picture or your own original picture in the picture information interface, and click the corresponding button such as Wen Sheng’s picture to copy the picture and parameters to the specified module together. It should be noted that it may change your webui. Some settings that are easy to notice, such as the settings of plug-ins such as controlnet, Clip skip, ENSD, etc., you can check these parts if you use your own parameter calculations and find something is not right.

9.3 Save and browse pictures

All output pictures will be automatically stored in the following paths, and pictures of different modules are placed in corresponding folders separately

  • *\outputs

webui comes with a gallery browser, which can satisfy small-scale picture browsing, and it is more convenient to call parameters, but after all, it is a web program, and it is more efficient to use resource manager in large-scale picture management

9.4 Related to 40 series graphics cards

If you are a 40-series graphics card, you may need to replace the cudnn file that comes with the integration package to obtain all the computing speed performance, which will probably be more than doubled.

Search for "cudnn" in the webui folder, find the path where the cudnn file is located, decompress the downloaded compressed package, copy all the files in the bin folder in the package to the path where the cudnn file of webui is located, and choose to replace the corresponding file

 

Guess you like

Origin blog.csdn.net/u011936655/article/details/130942627