Advantages of AI: Generating Images Using GPT and Diffusion Models

Recommendation: Use the NSDT scene editor to quickly build a 3D application scene

The world is fascinated by artificial intelligence (AI), especially recent advances in natural language processing (NLP) and generative AI, and for good reason. These breakthrough technologies have the potential to improve everyday productivity for a variety of tasks. For example, GitHub Copilot helps developers quickly write entire algorithms, OtterPilot automatically generates executive meeting minutes, and Mixo allows entrepreneurs to quickly launch websites.

This article will provide a brief overview of generative AI, including examples of relevant AI techniques, and then put theory into practice with a generative AI tutorial where we will create artistic renderings using GPT and diffusion models.

Six AI-generated images of the author, created using the techniques in this tutorial.

A Brief Overview of Generative AI

Note: Those familiar with the technical concepts behind generative AI can skip this section and continue with this tutorial.

In 2022, many basic model implementations will enter the market, accelerating the progress of artificial intelligence in many fields. After understanding a few key concepts, we can better define the underlying model:

  • Artificial intelligence is a general term that describes any software that can intelligently accomplish a specific task.
  • Machine learning is a subset of artificial intelligence that uses algorithms that learn from data.
  • Neural networks are a subset of machine learning that use layers of nodes modeled after the human brain.
  • A deep neural network is a neural network with many layers and learning parameters .

The base model is a deep neural network trained on large amounts of raw data. In more practical terms, the base model is a very successful type of artificial intelligence that can be easily adapted and completed for various tasks. Fundamental models are at the heart of generative AI: text-generating language models (such as GPT) and image-generating diffusion models are fundamental models.

Text: Natural Language Processing Models

In generative AI, natural language processing (NLP) models are trained to generate text that reads as if it was written by a human. In particular, large language models (LLMs) are particularly relevant to today's AI systems. Classified by its use of large amounts of data, LLM can recognize and generate text and other content.

In practice, these models can be used as writing or even coding assistants. Natural language processing applications include simply restating complex concepts, translating text, drafting legal documents, and even creating exercise programs (although such uses have certain limitations).

Lex is an example of an NLP writing tool that does many things: come up with headlines, complete sentences, and write entire paragraphs on a given topic. Currently the most recognizable LLM is GPT. Developed by OpenAI, GPT can respond within seconds to almost any question or command with high precision. Various models of OpenAI are available through a single API. Unlike Lex, GPT can make a developer's life easier by manipulating code, programming solutions to functional requirements, and identifying issues within code.

Image: AI Diffusion Model

Diffusion models are deep neural networks that incorporate latent variables capable of learning the structure of a given image by removing blur (i.e. noise). After the model's network is trained to "know" the conceptual abstraction behind an image, it can create new variations of that image. For example, by removing noise from images of cats, the Diffusion model "sees" clean images of cats, learns what cats look like, and applies this knowledge to create new variants of cat images.

Diffusion models can be used to denoise or sharpen images (enhancing and perfecting them), manipulate facial expressions, or generate images of facial aging to suggest how a person might look over time. You can browse the Lexica search engine to see how powerful these AI models are at generating new images.

Tutorial: Diffusion Model and GPT Implementation

To demonstrate how to implement and use these techniques, let's practice generating anime-style images using the HuggingFace diffusion model and GPT, neither of which require any complex infrastructure or software. We will start with an off-the-shelf model (i.e. one that has already been created and pre-trained), we just need to fine-tune it.

Note: This article shows how to use generative AI image and language models to create your own high-quality images in an interesting style. The information in this article should not be (mis)used to create deepfakes in violation of the Google Colab Terms of Use.

Setup and Photo Requirements

To prepare for this tutorial, register at:

Google

Using Drive and Colab.

open artificial intelligence

Make GPT API calls.

You'll also want to save 20 photos of yourself (or even more) on the device you plan to use for this tutorial to improve performance. For best results, photos should:

  • No smaller than 512 x 512 pixels.
  • Belongs to you, and only you.
  • have the same extended format.
  • Shoot from various angles.
  • Include at least three to five full-body shots and two to three mid-body shots; the rest should be mugshots.

That said, photos don't need to be perfect -- it might even be instructive to see how deviating from these requirements affects the output.

AI Image Generation Using Hugging Face Diffusion Model

To get started, open the tutorial's companion Google Colab notebook, which contains the required code.

  1. Running Cell 1 connects Colab with your Google Drive to store the model and save its generated images later.
  2. Run unit 2 to install the required dependencies.
  3. Run cell 3 to download the hugging surface model.
  4. In cell 4, type "My Appearance" in the field and run the cell. Session names typically identify the concepts the model will learn.Session_Name
  5. Run cell 5 and upload your photo.
  6. Go to cell 6 to train the model. By checking this option before running the unit, it can be retrained multiple times. (This step may take about an hour to complete.Resume_Training
  7. Finally, run cell 7 to test the model and see it in action. The system will output a URL where you can find the interface to generate the image. After entering the prompts, press the "Generate " button to render the image.
    User interface for generating images

With a working model in place, we can now experiment with various cues, resulting in different visual styles (for example, "me as an animated character" or "me as an impressionist painting"). However, using GPT for character cues is optimal because it yields more detail than user-generated cues and maximizes the potential of the model.

Prompts for Efficient Diffusion Models Using GPT

We will add GPT to our pipeline via OpenAI, although Cohere and other options provide similar functionality for our purposes. First, register on the OpenAI platform and create your API key. Now, in the "Generate good hints" section of the Colab notebook, install the OpenAI library:

pip install openai

Next, load the library and set an API key:

import openai
openai.api_key = "YOUR_API_KEY"

We'll generate optimized hints from GPT to generate our image in the style of an anime character, replacing the session name "My Look" set in notebook cell 4:YOUR_SESSION_NAME

ASKING_TO_GPT = 'Write a prompt to feed a diffusion model to generate beautiful images '\
                'of YOUR_SESSION_NAME styled as an anime character.' 
response = openai.Completion.create(model="text-davinci-003", prompt=ASKING_TO_GPT,
                                    temperature=0, max_tokens=1000)
print(response["choices"][0].text)

This parameter, which ranges between 0 and 2, determines whether the model should strictly adhere to the data it was trained on (values ​​close to 0) or be more creative with its output (values ​​close to 2). This parameter sets the amount of text to return, with four tokens equivalent to approximately one English word.temperaturemax_tokens

In my case, the GPT model output is as follows:

"Juan is styled as an anime character, with large, expressive eyes and a small, delicate mouth.
His hair is spiked up and back, and he wears a simple, yet stylish, outfit. He is the perfect
example of a hero, and he always manages to look his best, no matter the situation."

Finally, by feeding this text as input into the diffusion model, we achieve the final output:

Six AI-generated author images, optimized using GPT-generated hints.
Six AI-generated author images, optimized using GPT-generated hints.

Letting GPT write diffusion model hints means that you don't have to think through the nuances of an anime character's appearance—GPT will generate an appropriate description for you. You can always tweak the tips further to taste. After completing this tutorial, you can create your own complex creative image or any concept you want.

The benefits of AI are at your fingertips

GPT and diffusion models are two fundamental modern AI implementations. We've seen how to apply them individually, and multiply their power by pairing them, using the GPT output as a diffusion model input. Along the way, we created a pipeline of two large language models that maximize their own usability.

Original Link: Advantages of Artificial Intelligence: Generating Images Using GPT and Diffusion Models (mvrlink.com)

Guess you like

Origin blog.csdn.net/ygtu2018/article/details/132713453