A reminder of the past and present of engineering

Original Link: Cheese AI Eats Fish

Situational learning through prompts

In biology, emergence is an incredible property where, as a result of interactions, parts come together to exhibit new behaviors (called emergence) that you cannot see on smaller scales. arrived. What's even more incredible is that even though the smaller scale version looks similar to the larger scale, the larger scale is actually made up of more parts and interactions, which end up exhibiting a completely different set of behaviors.

It all starts with the ability to train these AI models in an unsupervised fashion. In fact, unsupervised learning has been one of the key tenets of this AI revolution that has held back AI progress over the past few years.

Prior to 2017, most AI worked using supervised learning from small structured datasets, which allowed training machine learning models on very narrow tasks. After 2017, things started to change with the advent of a new architecture called Transformer.

This new architecture can be used with unsupervised learning methods. Machine learning models can be pretrained on very large unstructured datasets with a very simple objective function: text-to-text prediction.

The exciting aspect is that in order to learn how to correctly perform text-to-text predictions (which may seem like a very simple task), a machine learning model starts to learn a series of patterns and heuristics around the data it is trained on.

This enables machine learning models to learn a wide variety of tasks.

Instead of trying to perform a single task, large language models start inferring patterns from data and reusing those patterns when performing new tasks.

This is a revolution to the core. Also, with the GPT-3 paper, another tipping point is being able to hint these models.

In short, it enables these models to further learn the user's context through natural language instructions, which can dramatically change the model's output.

This other aspect is also emerging because no one has explicitly asked for this. So, this is how we get contextual learning through cues that are a core, emerging property of current machine learning models.

Learn about Tip Engineering

Hint engineering is a key, emerging property of the current AI paradigm.

One of the most interesting aspects of Prompt Engineering is its emergence as an emerging property of extending the Transformer architecture to train large language models.

Just as your expressed desires can work against you, when you cue a machine, the way you express what it needs to do can drastically change the output.

What's the most fun part? Hints are not a feature developed by AI experts. This is an emerging feature. In short, by developing these huge machine learning models, hints become the way to get the machine to act on the input. No one asked for this; it just happened!

In a 2021 paper, researchers at Stanford University highlighted how Transformer-based models are fundamental.

As explained in the same paper:

The story of artificial intelligence has always been one of emergence and homogenization. With the introduction of machine learning, how tasks are performed emerges from examples (automatic inference); through deep learning, advanced features for prediction emerge; through basic models, even advanced functions such as situational learning can emerge. At the same time, machine learning homogenizes learning algorithms (such as logistic regression), deep learning homogenizes model architectures (such as convolutional neural networks), and basic models homogenize the model itself (such as GPT-3).

Hint engineering is a process used in artificial intelligence in which one or more tasks are converted into a hint-based dataset and then a language model is trained to learn.

On the surface, the motivation behind hint engineering can be difficult to understand, so let's describe the idea with an example.

Imagine that you are building an online food delivery platform, and you have thousands of images of different vegetables to include on the site. The only problem is that there isn't any image metadata describing which vegetables are in which photos. At this point, you can do the tedious sorting of the images and put the potato photos into the potatoes folder, the broccoli photos into the broccoli folder, and so on.

You can also run all images through the classifier to sort them more easily, but, as you discovered, training the classifier model still requires labeled data. Using hint engineering, you can write text-based hints that you think will produce the best image classification results.

For example, the model can be told to display "an image containing potatoes". The structure of that hint—or the statement that defines how the model recognizes the image—is the basis of hint engineering. Writing the best tips often takes trial and error. In fact, the prompt "image containing potatoes" is very different from "photographs of potatoes" or "a collection of potatoes".

Tips for Engineering Best Practices

As with most processes, the quality of the input determines the quality of the output. Designing effective cues can increase the likelihood that a model will return a favorable and contextually appropriate response. Well-written hints are about understanding what the model "knows" about the world, and then applying that information accordingly. Some see it as akin to a guessing game, where actors provide their partners with enough information to use their intelligence to figure out a word or phrase.

Think of the model as representing a partner in a guessing game, with training cues providing the model with enough information to work out the pattern and complete the task at hand. It doesn't make sense to overload a model with all the information at once and interrupt its natural flow of intelligence.

Rapid Engineering and CLIP Models

The CLIP (Contrastive Language-Image Pre-training) model was developed in 2021 by the artificial intelligence research lab OpenAI.

According to the researchers, CLIP is "a neural network trained on various (image, text) pairs. It can be instructed in natural language to predict the most relevant piece of text given an image, without directly optimizing the task , similar to the zero-shot features of GPT-2 and 3.”

Based on a neural network model, CLIP was trained on more than 400 million image-text pairs containing images matched to captions. Using this information, people can feed images into the model, and the model will generate what it thinks is the most accurate headline or summary. The quote above also refers to the zero-shot capabilities of CLIP, which makes it somewhat special among machine learning models.

For example, most classifiers trained to recognize apples and oranges are expected to perform well at classifying apples and oranges, but typically fail to detect bananas. Certain models (including CLIP, GPT-2, and GPT-3) can recognize bananas. In other words, they can perform tasks for which they were not explicitly trained. This ability is known as zero-shot learning.

Prompt engineering example

As of 2022, the evolution of AI models is accelerating. This makes hint engineering increasingly important. We first use language models such as GPT-3, BERT for text-to-text processing. We then used Dall-E, Imagen, MidJourney, and StableDiffusion for text-to-image conversion. At this stage, we're moving to text-to-video with Meta's Make-A-Video, and now Google is working on its own Imagen Video. Effective AI models today focus on getting more with less! An example is DreamFusion: Text to 3D using 2D Diffusion, built by Google Research Labs.

In short, AI diffusion models are generative models, meaning they produce outputs similar to the ones they were trained on. Diffusion models, by definition, work by adding noise to training data and recovering that data by inverting the noise process to generate an output. Google Research's DreamFusion is able to convert text into 3D images without having a large dataset of labeled 3D data (not currently available).

That's it! As the research team explained:

"Adapting this approach to 3D synthesis requires large-scale datasets of labeled 3D data and efficient architectures for denoising 3D data, neither of which currently exists. In this work, we These limitations are circumvented by using a text-to-image diffusion model to perform text-to-3D synthesis."

Why is this important? After more than two decades of being primarily text-based or 2D image-based on the web, now is the time to enable enhanced formats such as 3D that can work well in an AR environment.

In short, imagine you're wearing Google's AR glasses, and these AI models below can dynamically augment the real world with 3D objects, allowing you to make your AR experiences even more compelling.

At the same time, OpenAI announced the launch of Whisper speech-to-text capabilities. Combined, these AI models will create a multimodal environment where one person or small team can leverage all of these tools for content generation, filmmaking, medicine, and more! This means that some previously inaccessible industries become easier to expand as barriers to entry are removed. It can be tested/launched/iterated faster, allowing the market to evolve faster.

The Internet has evolved for nearly 30 years, but many industries (from healthcare to education) are still limited to old models. A decade of artificial intelligence may completely reshuffle the cards. Every AI model will be prompted in the same way, but the way the machine is prompted can be very subtle, and the machine can produce many different outputs due to variations in the cues.

Just in October 2022:

  • Stability AI announced $101 million in funding for open source artificial intelligence.

  • Jasper AI, a startup developing an "artificial intelligence content" platform, has raised $125 million at a valuation of $1.5 billion. Jasper is acquiring AI startup Outwrite, a grammar and style checker with more than 1 million users.

  • OpenAI, valued at nearly $20 billion, is in advanced talks with Microsoft for more funding.

Today, with hints, you can generate more and more output.

Some use cases of OpenAI can be generated from hints. From question answering to classifiers and code generators. The number of use cases AI enables via hints is growing exponentially.

Another cool app? You can design your own shoes according to the prompts:

 

Prompt DreamStudio AI to generate a custom pair of sneakers.

Timely engineering examples and case studies

Here's a quick engineering example with some best practices in the process.

ChatGPT hint example

code generation

content creation

data analysis

education and training

decision making and problem solving

Main points:

  • Hint engineering is a natural language processing (NLP) concept that involves discovering inputs that produce desired or useful results.

  • As with most processes, the quality of the input in prompt engineering determines the quality of the output. Designing effective cues can increase the likelihood that a model will return a favorable and contextually appropriate response.

  • The CLIP (Contrastive Language-Image Pre-training) model, developed by OpenAI, is an example of a model that uses cues to classify images and captions in more than 400 million image-caption pairs.

 

Guess you like

Origin blog.csdn.net/wwlsm_zql/article/details/131609427