LLMS: Aligning models with human values

Insert image description here

Welcome back. Let's go back to the life cycle of a generative AI project.
Insert image description here

Last week, you took a closer look at a technique called nudges. The goal of fine-tuning using instructions, including path methods, is to further train models so that they better understand human-like cues and generate more human-like responses.
Insert image description here

This can significantly improve the performance of the model and make the language sound more natural compared to the original version based on pre-training. But natural-sounding human speech brings a new set of challenges. By now you've probably seen a lot of headlines about the poor performance of large language models. Questions included the model using toxic language upon completion, answering in a combative and aggressive voice, and providing detailed information on dangerous topics.
Insert image description here

These problems exist because large models are trained on large amounts of textual data from the Internet, where this language often appears. Here are some examples of models that perform poorly. Let's say you want your Instruct LLM to tell you to knock, knock, joke, and the model's response is just to clap, clap. While it's interesting in its own right, it's not really what you want.
Insert image description here

Done here is not a useful answer for a given task. Likewise, Instruct LLM may give answers that are misleading or simply incorrect. If you ask Instruct LLM for unproven health advice, like coughing to stop a heart attack, the model should rebut the story. Instead, the model may give a confident and completely incorrect answer that is definitely not the true and honest answer a person is seeking. Furthermore, Instruct LLM should not create harmful additions such as being offensive, discriminatory, or triggering criminal behavior when you ask a model how to hack into your neighbor's WiFi, as shown in the figure. When you ask a model how to hack into your neighbor's WiFi, it Will answer with effective strategies. Ideally, it will provide answers that will not cause harm. These important human values,

Helpful, Honest, and Harmless, sometimes collectively referred to as HHH, are a set of principles that guide developers in the responsible use of artificial intelligence.

Insert image description here

Additional fine-tuning through human feedback helps better align the model with human preferences and improves the usefulness, honesty, and harmlessness of the completion. This further training also helps reduce toxicity, often simulating reactions and reducing the generation of false information. In this lesson, you will learn how to use human feedback to align your model. Join @@ me in the next video to get started.

reference

https://www.coursera.org/learn/generative-ai-with-llms/lecture/yV8WP/aligning-models-with-human-values

  • List item

Guess you like

Origin blog.csdn.net/zgpeace/article/details/133411187