Wensheng map of Ali Department (PAI + general meaning)

The PAI-Diffusion model is here! Ali cloud machine learning team takes you to explore the ocean of Chinese art - Zhihu Authors: Wang Chengyu, Duan Zhongjie, Zhu Xiangru, Huang Jun Guide reading In recent years, with the explosive growth of massive multi-modal data on the Internet and the large computing power of training deep learning models The application of AI Generated Content (AIGC) shows an explosive growth trend. Among them, text and pictures... https://zhuanlan.zhihu.com/p/590020134 EasyNLP Chinese text and map generation model will take you to become an artist in seconds-Zhihu Authors: Wang Chengyu, Liu Tingting Guide reading Xuanwu is better than words, and nothing is better than drawing. --[Jin] Luji multimodal data (text, image, sound) is an important carrier for human beings to recognize, understand and express everything in the world. In recent years, the explosive growth of multi-modal data has promoted the prosperity of the content Internet, and also brought… https://zhuanlan.zhihu.com/p/547063102 ModelScope Magic Building Community https://modelscope.cn/studios/damo/ ai_artist/summary ModelScope Mota Community https://modelscope.cn/models/damo/cv_diffusion_text-to-image-synthesis/summary PAI Diffusion (Food) - a Hugging Face Space by alibaba-pai Discover amazing ML apps made by the communityicon-default.png?t=N4P3https://huggingface.co/spaces/alibaba-pai/pai-diffusion-artist-xlarge-zh When the text and image generation model of the big fire meets the knowledge map, AI portraits approach the real world-Zhihu Authors: Zhu Xiangru, Duan Zhongjie, Introduction by Wang Chengyu and Huang Jun User Generated Content (UGC) is an important part of multimodal content on the Internet. The continuous growth of UGC data level has promoted the prosperity of major multimodal content platforms. In the massive multi-modal data and deep learning model... https://zhuanlan.zhihu.com/p/581870071 Comparing the English Wensheng map, for us, we should pay more attention to the Chinese Wensheng map. The currently known Taiyi , altdiffusion, these two effects are very poor, non-open source version, Baidu’s Wenxin Yige, Ali’s Tongyi, Tongyi’s follow-up should be open source, and secondly, Ali’s internal PAI platform is also doing Vincent diagrams, based on easynlp , are basically open source.

1.PAI-Diffusion

Text encoder: Use easynlp Chinese clip, clilp also has a generic Chineseclip inside the Ali system, and the effect is also very good. Here, the text transformer of the cross-modal alignment model trained by easynlp is used as the text encoder.

Latent Diffusion:同sd

Auto Endoer: same sd

SR:ESRGAN

The latent diffusion mode was pre-trained for 20 days using the 20 million Chinese graphic data in the Wukong dataset, and fine-tuned on multiple downstream tasks, with a parameter size of about 1B.

2. vqvae

2.ARTIST

The construction of the ARTIST model is based on the Transformer model, which divides the text and image generation task into two stages. The first stage is to vectorize the image through the VQGAN model, that is, for the input image, the image is encoded into a fixed-length discrete Sequence, the decoding stage takes a discrete sequence as input and outputs a reconstruction map. The second stage is to take the text sequence and the encoded image sequence as input, and use the GPT model to learn the image sequence generation conditioned on the text sequence. In order to enhance the model prior, we designed a Word Lattice Fusion Layer, which introduces the entity knowledge in the knowledge graph into the model, and assists the generation of corresponding entities in the image, so that the entity information of the generated image is more accurate.

3. General meaning

Overall parameter 50B

4. Evaluation

 

Guess you like

Origin blog.csdn.net/u012193416/article/details/130910182