Table of contents

foreword

Recently, Text-To-Image is a very hot topic, and even further, the topic of Text-To-Video is also rising. I recently saw an open source project FlagAI, which is one of the projects that I think works better. The installation is simple, and it supports Chinese and English , which is very nice.

Project open source address: github address

The following is the effect I ran with the project demo, you can take a look.

Entered text:

Anime portrait of natalie portman as an anime girl by stanley artgerm lau, wlop, rossdraws, james jean, andrei riabovitchev, marc simonetti, and sakimichan, trending on artstation

To translate, messy names, anime portrait girls, probably like this.

project structure

In ReadMe, the author not only provides a Chinese and English pre-training model that can be used quickly.

There are also operating instructions for tokenizers and predictors.

I'll start by opening the sample code the author gave us.

The way to use it is very simple. The method of generating images is: predictor.predict_generate_images. The code looks very simple and mobile. The installation instructions given by the author are as follows:

OK, I'll change the sample code to a page-interactive mode for easy use.

Page interaction adjustment

code show as below:

import torch
from flagai.auto_model.auto_loader import AutoLoader
from flagai.model.predictor.predictor import Predictor
from PIL import Image
import gradio as gr
import os
import shutil

# Initialize
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

loader = AutoLoader(task_name="text2img",  # contrastive learning
                    model_name="AltDiffusion",
                    model_dir="./checkpoints")

model = loader.get_model()
model.eval()
model.to(device)
predictor = Predictor(model)


def handle(text: str):
    if not os.path.isdir("./AltDiffusionOutputs"):
        os.mkdir("./AltDiffusionOutputs")
    else:
        shutil.rmtree("./AltDiffusionOutputs")
    predictor.predict_generate_images(text)
    imgs = []
    for s in os.listdir("./AltDiffusionOutputs/samples"):
        imgs.append(Image.open(os.path.join("./AltDiffusionOutputs/samples", s)))
    return imgs


if __name__ == '__main__':
    demo = gr.Interface(fn=handle, inputs=gr.Text(),
                        outputs=[gr.Image(type="pil"), gr.Image(type="pil"),
                                 gr.Image(type="pil"), gr.Image(type="pil")])
    demo.launch(server_name="0.0.0.0", server_port=12003)

In addition to the installation content required by the project, gradio needs to be installed additionally.

The installation command is as follows:

pip install built -i https://pypi.douban.com/simple

The first execution will create a checkpoints folder in the current directory and download the pre-trained model, which takes a long time.

The result after the second execution is as follows

******************** text2img altdiffusion
LatentDiffusion: Running in eps-prediction mode
DiffusionWrapper has 859.52 M params.
making attention of type 'vanilla' with 512 in_channels
Working with z of shape (1, 4, 64, 64) = 16384 dimensions.
making attention of type 'vanilla' with 512 in_channels
******************** txt_img_matching altclip-xlmr-l
model files:['config.json', 'pytorch_model.bin', 'tokenizer.json', 'tokenizer_config.json', 'preprocessor_config.json', 'special_tokens_map.json']
./checkpoints/AltCLIP-XLMR-L
Global Step: 143310
Running on local URL: http://0.0.0.0:12003

To create a public link, set `share=True` in `launch()`.

Open the page through a browser: http://localhost:12003

Enter the text that needs to be generated as a picture, and click submit.

Let's try it in Chinese. The input this time is: a temple located in a psychedelic forest, surrounded by maple trees full of red maple leaves.

Let's take a look at the effect of execution

The demo method will generate 4 pictures by default, let's take a look together.

At first glance, I thought it was a photo. Let me simplify the operation process and look at the generated effects.

Text: An anime girl with angel wings

Text: Chinese Painting Style

Summarize

Personally, the overall effect of the project is good, and English is still slightly better than Chinese. You can try it yourself.

I've been too busy recently to write articles. Many people contacted me on WeChat, asking me about the installation and deployment of this article and that article, and I didn't have much time to reply one by one. I wrote these articles not to bring myself any income, but mainly to share with you. I still hope that everyone will learn more and gain more.

The crooked makes the whole, the crooked makes it straight, the hollow makes it full, the hollow makes it new, the little makes you gain, and the too much makes you confused. It is the style of the world that the saint embraces one. If you don't see yourself, you will be clear; if you don't see yourself, you will show; if you don't cut yourself, you will be meritorious; The husband does not fight, so the world cannot compete with him. The ancient so-called song is complete, is it a lie! Come back with sincerity. —— "Tao Te Ching"

Generate pictures in one sentence, used by FlagAI (with page operation code) | machine learning

foreword

project structure

Page interaction adjustment

Summarize

Guess you like