Essay reading_Make good use of Midjourney

Paper information

name_en: Grimm in Wonderland: Prompt Engineering with Midjourney to Illustrate Fairytales
name_ch: Using Midjourney to Generate Illustrations of Grimm's Fairytales
paper_addr: http://arxiv.org/abs/2302.08961
date_publish: 2023-02-17
author: Martin Ruskov, University of Milan

after reading

A series of experiments were carried out using the tool Midjourney v4 for the hinting project of generating graphs. A 4-stage process for generating cues is derived: initial cues, compositional adjustments, style refinement, and incorporation of variations. Three reasons for poorly generated images are also discussed: counting difficulties, difficulty generating hypothetical scenarios, and inability to describe overly exotic situations. The author believes that this is not only used to generate pictures, but also has general applicability to future generative models.

introduce

Previous studies of hint engineering included: subjects, verbs, context, style; later others proposed methods for: subject headings, style modifiers, image hints, quality boosters, repetition, and magic terms.
Midjourney is one of the most popular tools in practice, despite being commercial and knowing very little about architecture. The current Midjourney V4 is more complex, it supports more knowledge, can generate more details, can accept more complex prompts, and can handle multi-entity scenarios.

method

Current image generators can not only take text as input and generate images, but also support input modifiers to correct images. It is currently known that VQGAN + CLIP and Stable Diffusion have very different architectures, and little is known about the architectures of Dall-E and Midjourney. Therefore, Midjourney-specific magic terms and quality setting parameters are not discussed in this article, but focus on some general methods, such as theme styles, etc.

theme

In the first step, topic cues are derived from the original text and simplified and adjusted (such as replacing pronouns with specific nouns) to improve the results.

style

The style here refers to the media and style in the former humanities. Since the fairy tale illustrations are generated, I hope that the generator will not introduce too many details (Midjourney’s default art style is rich in details), so I tried book illustrations or minimalist illustrations, etc. Style modifiers to restrict styles.

image prompt

The experiment did not upload a reference picture, and used the image fine-tuning function provided by Midjourney. Without image-based fine-tuning, the consistency between images is a challenge. For example, when different scenes are generated for the same fairy tale, the same character may be generated completely differently. This issue is not discussed in this paper.

result

The four stages of generating a graph

  • Initial tip: Summarize the original text, using a simple sentence as much as possible
  • Divided adjustments: Adjust prompts incrementally, prioritizing small changes to generate good feedback iterations. Pay special attention to possible misinterpretation of ambiguous words. Divided into the following three levels:
    • Words are adjusted, optionally simplified or replaced with synonyms, words that may better represent the context. This may include reducing phrasal verbs to verbs of action, sacrificing narrative richness and fidelity for expressive accuracy.
    • Add or delete adjectives for entities (subject and object) or add adverbs for verbs.
    • Add objects to better represent context and/or force removal of unnecessary objects.
  • Refine the style: Whenever you find the generator redundant, you can suppress it by forcing a style with modifiers on a basic, simple, minimal, flat color (generating a fairytale without too much detail).
  • Adjust the existing image: Once the overall content of the image is stabilized, as long as the generator supports fine-tuning (MidJounery is an extended model that supports fine-tuning), it can be adjusted on the basis of the image. For example: Adjusting the number of entities.

Figure-1 shows the original text, the adjusted prompt text, and the final generated satisfactory picture.

Generator Current Issues

  • Difficulty in counting: For example, if you describe and draw three crows, you will generate five. The number of fingers is not equal. This can be corrected by trying a few more times or fine-tuning.
  • It is difficult to generate hypothetical scenarios: the model does not have prior knowledge, as shown in item 1 in Figure-2.
  • Unable to describe too bizarre situations: For non-traditional situations, from non-realistic texts (also known as impossible scenarios), the generation effect is not good, as shown in items 2 and 3 in Figure-2.

Midjourney usage

url

https://www.midjourney.com/

register

  • science online
  • Click Sign in on the main interface, select no account, create one, and then activate by email
  • You must receive text messages on your mobile phone to complete the registration, and you can support domestic mobile phones

Open Midjounery

Click Join the Beta on the main interface, and you will enter the painting chat room, where you can see other people's paintings

Guess you like

Origin blog.csdn.net/xieyan0811/article/details/129343597