Amount of data during stable diffusion model training

Stable Diffusion of Vincentian Diagram Model - I know that the road to AGI is really not easy, please like it! https://zhuanlan.zhihu.com/p/6424968622022 can be said to be the first year of AIGC (AI Generated Content). In the first half of the year, there are large Vincent graph models DALL-E2 and Stable Diffusion, and in the second half of the year there is OpenAI’s large text dialogue model Ch... icon-default.png?t=N7T8https://zhuanlan.zhihu.com/p/617134893 1. Data set

1.1 laion data set

The laion2B-en data set is a subset of laion5B. More specifically, it is the English data set in laion-5B. laion-5B is an image-text pair filtered from the web page data common crawlel, containing 5.85B images. Text pairs, the amount of data in which the text is in English is 2.32B, this is the laion-2B-en data set.

The sample size for images with width and height above 256 is 1324M, for images above 512 it is 488M, for images above 1024 it is 76M, and the average text length is 67.

1.2 WUkong data set

Including 100 million image and text pairs

2. Model training

2.1 runwayml 1.5

For training with a score of 5 or above on the laion-2B-en data set, 256x256 was first used, and then 512x512 was used. 32 8-card A100 40G were used, bs=32x8x2x2=2048. 150,000 hours of training, about 25 days.

2.2 stability 2.0

Trained with a score of 4.5 or above on the laion-2B-en data set,

2.3 stability 2.1

SD 2.1 releases some data filtered by nsfw on the basis of SD 2.0.

2.2 mosicML sd 2

Using a subset of laion-5B, which includes samples with English-only titles and aesthetic scores of 4.5+, the first stage uses 0.79B samples with resolutions greater than 256x256, and the second stage uses 0.3B samples greater than 512x512, 128 Taiwan A100, the first stage took 1.6 days and 550,000 iterations, and the second stage took 4.9 days and 850,000 iterations.

2.3 pai-diffusion

Pre-training was conducted for about 20 days using 20 million Chinese image and text data pairs in the Wukong data set.

2.4 chineseclip

The zh text in laion-5B is about 110 million, and Wukong's is 70 million. Adding in its own data, the total is about 200 million.

Guess you like

Origin blog.csdn.net/u012193416/article/details/133232661