Add watermarks to AI-generated content to prevent the proliferation of DeepFake!

Source Xinzhiyuan Editor: Editorial Department

[New Zhiyuan Guide] Seven technology giants including OpenAI and Google jointly stated that in the future, recognizable watermarks will be added to the text, pictures, audio, and video created by generative AI to mark all DeepFake works and maintain information security.

Recently, seven technology giants in the United States, led by OpenAI and Google, jointly announced——

Embed watermarks for all AI-generated content.

Perhaps, the days of being fooled by DeepFake every day will be gone forever!

Seven giants join forces to watermark AI

I still remember that more than ten years ago, there were always people bragging in various forums.

If others don't believe it, they will comment, there is no picture and no truth.

In the future, posting pictures and videos may not be 100% credible, because we all know that pictures can be posted and videos can be edited.

In the future, generative AI will explode, not to mention pictures, videos can be generated, audio can be generated, and text can be generated.

They are not as simple as editing and p-pictures, but are directly generated from 0 to 1, which is extremely smooth.

I can't help but wonder, is there anything real in this world! (air shaking cold)

f7559c26a1cd82be9ad2d6efe9e62be4.png

Fortunately, the situation is about to improve.

Seven technology giants including OpenAI, Microsoft, Google, Meta, Amazon, Anthropic and Inflection announced that they will develop a watermarking technology that will be added to all AI-generated content.

The U.S. government says the technology will help in the future to more safely share AI-generated text, images, audio, and video without misleading others about its authenticity.

At present, it is unclear how to embed watermarks in different forms of generated products.

But the function is certain - after embedding, the user can know the origin of the current generated product and which generative AI tool made it.

1b75191d114c09e3dcd880f19bec53d5.jpeg

The real reason for the seven giants to take action is the extremely high degree of attention of users and policy makers.

The above DeepFake of Trump being caught (familiar friends must know that it is a set of pictures) was generated by Midjourney some time ago.

Although the picture was outrageously fake in terms of content and composition, Midjourney still banned some accounts for it.

The reason is to spread false photos.

The author of those pictures once said that if there was a watermark at that time, it is estimated that the consequences would not be so serious.

After all, most of the motivation for generating funny memes is for fun.

But memes are fun, but fake audio and fake videos are not.

I have seen too much news this year, AI voice generation software forged the audio of family members, defrauding money.

According to FBI statistics, AI-generated DeepFake has become a factor that cannot be ignored in blackmail cases, especially sex-related blackmail.

Not to mention writing papers on GPT, which has been frequently searched.

It can only be said that DeepFake has penetrated into all aspects of multimodality, and the addition of watermarks can naturally separate true and false, build a barrier, and prevent confusion.

2b4216bb094a34186790633409636eef.jpeg

OpenAI stated in the official blog that it will develop a watermark mechanism and add it to video or audio. A detection tool will also be developed to determine whether certain content contains system-created tools or APIs.

Google also said that in addition to watermarking, there will be other innovative technologies to check the promotion of information.

The Biden administration will also create a foundation to ensure that all developments in AI must first have commitments and guarantees before promotion and implementation to avoid risks.

The tech giants also agreed to double-test AI systems internally and externally before they are released, invest more in cybersecurity, and share information across the industry to help reduce the risks of AI.

Nick Clegg, Meta's president of global affairs, echoed OpenAI, saying the tech company's commitment is an important step in ensuring responsible guardrails for AI.

Sign a voluntary commitment

643edaa371f2a6a7aa5b3c69dbb4c115.png

On July 21, 2023, the White House led and convened seven leading artificial intelligence companies—Amazon, Anthropic, Google, Inflection, Meta, Microsoft, and OpenAI—to discuss how to achieve "safe, reliable, and transparent development of artificial intelligence technology." ".

After the meeting, these companies have reached an agreement with the White House and issued a voluntary letter of commitment.

dd76e004f3e77a7ef8bb17489c464190.png

The promises made by these companies emphasize three principles for the future of artificial intelligence: safety, security, and trust.

This marks a crucial step forward for humanity in developing responsible artificial intelligence.

5ff9fa75d3e4410fff5ffe398c13350b.png

The contents of the letter of commitment may include:

1. Ensure the safety of products before they are released

1) Before the release of the artificial intelligence system, conduct security tests on the internal and external aspects of the artificial intelligence. (2 companies committed)

2) Share information on managing AI risks with industry, government, civil society and academia. (passed by all companies)

2. Build a "safety first" artificial intelligence system

3) Invest in cybersecurity and insider threat protection measures to protect proprietary versus unpublished model weights, and only disclose weights when security risks are considered. (2 companies committed)

4) Establish a strong reporting mechanism to encourage third parties to give immediate feedback on the vulnerabilities in the artificial intelligence system, and quickly discover and solve problems. (2 companies committed)

3. Gain the trust of the public

5) Develop strong technical mechanisms to ensure that users can identify AI-generated content, such as watermarking systems. (2 companies committed)

6) Disclose the functions, limitations, and scope of application of the artificial intelligence system, and warn of the security risks and social risks of artificial intelligence. (2 companies committed)

7) Prioritize research on social risks posed by artificial intelligence systems, such as avoiding harmful bias, discrimination, and protecting privacy.

And roll out specialized AI to mitigate those dangers. (2 companies committed)

8) Develop and deploy cutting-edge artificial intelligence systems to address societal issues and challenges, from cancer prevention to climate change mitigation. (2 companies committed)

0748cdc64ef923f5c3189b1ae554ff62.png

OpenAI also released a voluntary commitment on the website on the 21st: https://openai.com/blog/moving-ai-governance-forward

The content is basically the same as the statement in the White House, but the specific elaboration is more detailed. And specifically point out what the specific model in the commitment letter refers to-only for generative models that are generally more powerful than the current industry frontier.

For example, models that are overall more powerful than any currently published model, including GPT-4, Claude 2, PaLM 2, Titan, and image generation DALL-E 2, among others.

8cc5733f25953a8e7ebf60223dc39774.png

how to watermark

Some time ago, researchers from the University of Maryland proposed an efficient watermarking technology that allows synthetic text to be detected within a very short token span (only 25 tokens), while the false positive rate (the human text misjudgment machine-generated) extremely low.

Watermarks are hidden patterns in text that are imperceptible to humans but recognizable by algorithms as synthetic text.

937c0f62a560d2c7785c321c360a9826.png

Paper address: https://arxiv.org/pdf/2301.10226.pdf

As we all know, the working principle of AI language model is to predict and generate word by word.

After each word, the watermarking algorithm will randomly divide the vocabulary of the language model into "green list" and "red list", and then prompt the model to choose the word in the green list.

In a piece of text, if there are more words in the green list, then the text is likely to be machine-generated. Text written by humans tends to contain more random combinations of words.

For example, for the word "beautiful", the watermarking algorithm can classify "flower" as green and "orchid" as red. AI models using watermarking algorithms are more likely to use the word "flower" rather than "orchid".

61ffa4b022cc05b181b51d632f20d9b5.png

In summary, the characteristics of the watermarking algorithm are as follows:

- Watermarks can be detected algorithmically without any knowledge of model parameters or access to the language model API. This feature allows the detection algorithm to be open-sourced, even if the model is not. This also makes detection cheap and fast, since no LLM needs to be loaded or run.

- A standard language model can be used to generate watermarked text without retraining.

- Watermarks can be detected from consecutive parts of generated text. This way, even if only a portion of the generation is used to create a larger document, the watermark remains detectable.

- There is no way to remove the watermark without modifying a considerable percentage of the generated tokens.

- Strict statistical methods can be used to measure whether a watermark is detected or not.

Although the method proposed by the University of Maryland, there are still some problems that have not been resolved. For example, what is the best way to test watermarks in a streaming environment, or when short spans of watermarked text are within longer non-watermarked text?

But the researchers believe the results of their experiments are sufficient to confirm that watermarking can be a practical tool against malicious use of generative models.

The remaining questions are left to future research.

References:

https://arstechnica.com/ai/2023/07/openai-google-will-watermark-ai-generated-content-to-hinder-DeepFakes-misinfo/

Pay attention to the official account [Machine Learning and AI Generation Creation], more exciting things are waiting for you to read

Lying down, 60,000 words! 130 articles in 30 directions! CVPR 2023's most complete AIGC paper! read it in one go

Simple explanation of stable diffusion: Interpretation of the potential diffusion model behind AI painting technology

In-depth explanation of ControlNet, a controllable AIGC painting generation algorithm! 

Classic GAN has to read: StyleGAN

61bdd0b0ad88602b65cb24dbc4b18c77.png Click me to view GAN's series albums~!

A cup of milk tea, become the frontier of AIGC+CV vision!

The latest and most complete 100 summary! Generate Diffusion Models Diffusion Models

ECCV2022 | Summary of some papers on generating confrontation network GAN

CVPR 2022 | 25+ directions, the latest 50 GAN papers

 ICCV 2021 | Summary of GAN papers on 35 topics

Over 110 articles! CVPR 2021 most complete GAN paper combing

Over 100 articles! CVPR 2020 most complete GAN paper combing

Dismantling the new GAN: decoupling representation MixNMatch

StarGAN Version 2: Multi-Domain Diversity Image Generation

Attached download | Chinese version of "Explainable Machine Learning"

Attached download | "TensorFlow 2.0 Deep Learning Algorithms in Practice"

Attached download | "Mathematical Methods in Computer Vision" share

"A review of surface defect detection methods based on deep learning"

A Survey of Zero-Shot Image Classification: A Decade of Progress

"A Survey of Few-Shot Learning Based on Deep Neural Networks"

"Book of Rites·Xue Ji" has a saying: "Learning alone without friends is lonely and ignorant."

Click on a cup of milk tea and become the frontier waver of AIGC+CV vision! , join  the planet of AI-generated creation and computer vision  knowledge!

おすすめ

転載: blog.csdn.net/lgzlgz3102/article/details/131929225