Teach you how to quickly input Chinese/English text to generate images through PaddleHub (Stable Diffusion)

Recently, the text-image generation model based on Diffusion has become popular. The user inputs a sentence, and the model can generate a corresponding image, which is very interesting. This article records the process of quickly realizing the above tasks through PaddleHub for reference.

1. Install PaddlePaddle

The bottom layer of PaddleHub relies on PaddlePaddle, an open source framework developed by Baidu, which can be quickly installed according to the official method. The current documentation is quite comprehensive.

The official link is as follows: https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/install/pip/linux-pip.html

Before installation, you need to determine the system-related environment. The following is the installation command I chose:

python -m pip install paddlepaddle-gpu==2.3.2.post111 -f https://www.paddlepaddle.org.cn/whl/linux/mkl/avx/stable.html

 2. Install PaddleHub

PaddleHub has open sourced a rich set of pre-training models, covering 360+ pre-training models in six categories including large models, CV, NLP, Audio, Video, and mainstream industrial applications. Here we mainly use the text generation model, including Stable Diffusion and Disco Diffusion.

https://github.com/PaddlePaddle/PaddleHub/blob/develop/README_ch.md

!pip install --upgrade paddlehub -i https://mirror.baidu.com/pypi/simple

3. Model reasoning

PaddleHub has already perfected the model encapsulation, and can quickly call the interface of the text and image generation model with a few lines of code. The following is the introduction on PaddleHub, let's take a look at the effect of the example Stable Diffusion model.

The official sample link is as follows:

https://aistudio.baidu.com/aistudio/projectdetail/4512600

Stable Diffusion is a latent diffusion model (Latent Diffusion), which belongs to the generative model. This type of model obtains the image of interest by iteratively denoising and sampling random noise step by step, and has achieved amazing results. Compared with Disco Diffusion, Stable Diffusion iterates in the lower dimensional latent space instead of the original pixel space, which greatly reduces the memory and calculation requirements, and can be completed within one minute on the V100. The desired image can be rendered.

from PIL import Image
import paddlehub as hub

# 导入模型
module = hub.Module(name='stable_diffusion')

# 生成图像
result = module.generate_image(text_prompts="A beautiful painting of a singular lighthouse, shining its light across a tumultuous sea of blood by greg rutkowski and thomas kinkade, Trending on artstation.", output_dir='stable_diffusion_out')

# 将生成过程存成gif
result[0].chunks[-1].chunks.save_gif('beautiful_painting.gif')

输入:A beautiful painting of a singular lighthouse, shining its light across a tumultuous sea of blood by greg rutkowski and thomas kinkade, Trending on artstation.

The default output image is 512*512, as follows:

4. More tests

At this point, we can start testing more inputs, try it boldly. If the seed is set, different graphs will be generated each time it is run.

输入:A tree on the hilltop in autumn.

 

 The generation process is as follows:

 Enter: A man face.

 Enter: A woman face.

5. Try Chinese input

The Stable Diffusion model mentioned above does not yet support Chinese as input. At present, there are two models that support Chinese input, both of which are Disco Diffusion models. The speed of generating images is much slower than that of Stable Diffusion, so you need to wait patiently.

The official sample link is as follows:

https://aistudio.baidu.com/aistudio/projectdetail/4444998

The sample code is as follows:

​
from PIL import Image
import paddlehub as hub

# 导入模型
module = hub.Module(name='disco_diffusion_ernievil_base')

# 生成图像
result = module.generate_image(text_prompts="孤舟蓑笠翁,独钓寒江雪。", style='油画', width_height= [1280, 768], output_dir='孤舟蓑笠翁_油画', seed=1853109922)

# 将生成过程存成gif
​result[0].chunks.save_gif('孤舟蓑笠翁.gif')

Input: Weng in a lonely boat and hat, fishing alone in the cold river and snow.

6. Problem solution

In the process of using, there may be some problems, which are also recorded here.

Problem 1: Unable to use GPU

Solution: It is possible that the installed PaddlePaddle is not the gpu version, and the gpu version needs to be uninstalled and installed.

First use the following command to view the version of the currently installed paddle and related packages:

pip list| grep paddle

The query information is as follows:

paddle-bfloat                      0.1.7
paddle2onnx                        1.0.0
paddlefsl                          1.1.0
paddlehub                          2.3.0
paddlenlp                          2.4.0
paddlepaddle-gpu                   2.3.2.post111

If there is a CPU version, it can be uninstalled by the following command:

pip uninstall paddlepaddle

Then select the appropriate gpu version to install.

References: https://github.com/PaddlePaddle/PaddleHub/issues/1301

Question 2: Segmentation fault (core dumped)

Solution: It may be caused by cudnn version mismatch. It is recommended to check whether the versions of cuda, cudnn and paddlepaddle-gpu match.

References: https://github.com/PaddlePaddle/PaddleHub/issues/1301


After experience, you will find that the text_prompts in the parameters are very important, and not any input can generate a good picture. In some cases, it can be clearly felt that the effect is not very good, and it is also a problem that needs to be addressed in follow-up research.  

Guess you like

Origin blog.csdn.net/u013685264/article/details/126870337