Jailbreak ChatGPT, Bard, and Claude's security restrictions can be easily broken, study finds

Security restrictions on mainstream AI systems like ChatGPT, Bard, and Claude can be broken at will, new research finds.

Researchers at Carnegie Mellon University and the San Francisco Center for AI Security say in a new report that they have uncovered potentially multiple ways to break through the security limitations of mainstream AI chatbots.

The companies behind language models such as ChatGPT, Bard, and Claude employ extensive content moderation measures to ensure they do not generate unreasonable content. But the researchers found that they could use a crack technique developed for the open-source system to unlock another side of the constrained AI systems that are mainstream on the market.

The report proves that automatic adversarial attacks, mainly by adding specific characters at the end of user query sentences, can break through security restrictions and induce chatbots to generate what the mainstream considers to be incorrect content, abnormal information, or shocking remarks. Because these cracking techniques are fully automated, the researchers say there are "virtually unlimited" similar means.

The researchers have disclosed their findings to Google, Anthropic and OpenAI. Industry experts pointed out that this has triggered extensive thinking on issues such as the AI ​​review mechanism and the security of public release of open source language models. The content control of the AI ​​system is important, but it seems unrealistic to completely plug all "loopholes". The companies involved need to make continuous improvements in development to minimize the risk of AI systems being misused.

Related reading: A string of magical characters can make AI chatbots including ChatGPT abnormal

References:
https://www.94c.cc/info/jailbreaking-chatgpt-bard-and-claude-casually.html

Guess you like

Origin blog.csdn.net/2302_76860168/article/details/132568190