Can't stand the nonsense of AI, Nvidia made a move to install a "guardrail" for the big model | open source

Posted by Xiao Xiao from Aufeisi Qubit
| Public Account QbitAI

The nonsense of the big models is too serious, and Nvidia can't stand it anymore.

They've officially launched a new tool to help big models say what they should and avoid the things they shouldn't.

The new tool, called NeMo Guardrails, is equivalent to adding a security fence to the large model, which can control its output and filter its input.

On the one hand, when the user induces the large model to generate offensive code and output unethical content, it will be "bound" by the guardrail technology and no longer output unsafe content.

On the other hand, guardrail technology can also protect the large model from user attacks and help it block "malicious input" from the outside world.

e6b3837f72503d444bd8ad6e5818de09.png

Now, this large model guardrail tool has been open sourced , let's take a look at its effect and generation method.

Three types of "guardrails" to prevent large-scale model nonsense

According to Nvidia, NeMo Guardrails currently provides three forms of guardrail technology:

Topic guardrails, safety guardrails, and security guardrails.

5badfb71f34c6610a3b6e9e0a5e420aa.png

Topic-limited guardrails , in simple terms, are "to prevent large models from going off topic".

Large models have richer imaginations, and it is easier to complete creative code and writing work than other AIs.

But for specific scenario applications such as writing code and being a customer service, at least users don't want it to "get out of the target range" when solving problems and generate some content that has nothing to do with requirements.

In this case, topic-limited guardrails are needed. When the large model generates text or code that exceeds the scope of the topic, the guardrail will guide it back to the limited functions and topics.

ad188fe10a67503ee4787a65fe48dd06.png

Dialogue safety guardrail , which means to avoid "nonsense" when exporting large models.

Nonsense includes two situations.

On the one hand, the answers generated by the large model include factual errors, that is, things that "sounds reasonable, but are completely wrong";

On the other hand, large models generate biased and malicious output, such as swearing under user guidance, or generating immoral content.

b2b7f35fd160127ea2b04af9dc74ce66.png

Attack defense fence , that is, to prevent the AI ​​platform from malicious attacks from the outside world.

This includes not only inducing the large model to call an external virus APP to attack it, but also including hackers actively attacking the large model through the network and malicious programs. Guardrails will prevent these attacks in various ways, preventing large models from being crippled.

So, how to build such a guardrail?

How to build a large model "guardrail"?

Here we first look at what elements a standard "guardrail" contains.

Specifically, a guardrail should include three aspects, namely Canonical form, Messages and Flows.

The first is the format specification , that is, to specify the content to be output by the large model when faced with different questions.

For example, when asked "what is XX article", the large model must give a specific type of "article" instead of something else; when asked "who published what", the large model must give "person's name" instead of Another answer.

7e8fd65f7774f36e05773c4a6c10dedf.png

Then there is the message definition. Taking the topic "User Greeting" as an example, the large model can output these contents:

aca008c596c946987c2f752542a2ba88.png

Finally, the definition of interaction flow , such as telling the big model, how is the best way to greet users:

9ffdc035b9f17ca0f49b7223949ed562.png

Once the mechanism of greeting the user is triggered, the large model will enter the guardrail and greet the user in a proper manner.

The specific workflow is as follows: first, convert user input into a canonical form, and generate corresponding guardrails accordingly; then, generate action steps, and instruct the large model to complete corresponding operations step by step with interactive flow; finally, according to The format specification generates output.

590b742d3a2335c3060a0848786b17fc.png

Similarly, we can define various guardrails for large models, such as guardrails for "response to user abuse".

This way even if the user says "you're an idiot", the big model can learn to respond calmly:

951690e0030f48a2243cb76668185d5d.png

Currently, Nvidia is integrating guardrail technology into their AI framework NeMo, which is a framework for users to create various AI models and accelerate them on Nvidia GPUs.

Friends who are interested in the "guardrail" technology, you can give it a try~

Open source address:
https://github.com/NVIDIA/NeMo-Guardrails

Reference link:
https://www.nvidia.com/en-us/ai-data-science/generative-ai/nemo-framework/

Guess you like

Origin blog.csdn.net/QbitAI/article/details/130479171