A new era of the Internet is coming (2) What is AIGC?

What are AIGCs?

Recently, another term "**AIGC" has become popular. 2022 is called the first year of AIGC. So please look forward to it, AIGC will usher in the next era of artificial intelligence for us.

TIPS: The content comes from Baidu Encyclopedia, Zhihu, Tencent, "AIGC White Paper" and other web pages

1. What is AIGC?

AIGC, or AI Generated Content, uses artificial intelligence technology to generate content. It is considered to be a new content creation method after PGC and UGC. AI painting, AI writing, etc. are all branches of AIGC. In 2022, AIGC will develop rapidly. Among them, the continuous improvement of deep learning models, the promotion of open source models, and the possibility of commercialization of large models will become the "acceleration" of AIGC's development.

AIGC already represents a new trend in the development of AI technology. In the past, traditional artificial intelligence was biased towards analytical capabilities, that is, by analyzing a set of data, discovering the laws and patterns in it and using it for other analysis of existing things, it has realized the leap from artificial intelligence to perceiving and understanding the world to generating and creating the world.

Therefore, in this sense, AIGC in a broad sense can be regarded as an AI technology that has generative and creative capabilities like humans, that is, generative AI. It can autonomously generate and create new texts, images, Various forms of content and data such as music, video, and 3D interactive content (such as virtual avatars, virtual items, and virtual environments), as well as unlocking new scientific discoveries and creating new values ​​and meanings.

Therefore, AIGC has accelerated to become a new frontier in the field of AI, promoting artificial intelligence to usher in the next era. Gartner lists generative AI as one of the top 5 impactful technologies in 2022. MIT Technology Review also listed AI synthetic data as one of the top ten breakthrough technologies in 2022, and even called Generative Al (Generative Al) the most promising progress in the field of AI in the past decade. In the future, the AIGC model with both large and multimodal models is expected to become a new technology platform.

insert image description here
On January 10, 2023, at the Baidu Create AI developer conference, Baidu founder, chairman and CEO Robin Li said: AI has moved from understanding content to automatically generating content, including AIGC for painting, graphics, video, etc. Multiple types of content creation.

The artificial intelligence editorial department of CCTV Network is the smart innovation base of China Central Radio and Television. Core competitiveness promotes the deep integration of media, assists the digital transformation of various fields, and accelerates the intelligent upgrading of industries.

insert image description here

In December 2022, OpenAI's large-scale language generation model ChatGPT hit the Internet. It is capable of high EQ conversations, code generation, scripts and novels and other scenarios, pushing the human-computer dialogue to a new level. Netizens can't help but It is doubtful whether ChatGPT already has human intelligence.
Major technology companies around the world are actively embracing AIGC and continuously launching related technologies, platforms and applications.

insert image description here

2. AIGC development trend

2.1 The integration and development of AI technology has given birth to the outbreak of AIGC

insert image description here
First, the basic generative algorithm model continues to break through and innovate.
In 2014, the Generative Adversarial Network (GAN) proposed by Ian Goodfellow (Ian Goodfellow) became the most famous early generative model. GAN learns using a cooperative zero-sum game framework and is widely used to generate images, videos, speech, and 3D object models. GAN has also produced many popular architectures or variants, such as DCGAN, StyleGAN, BigGAN, StackGAN, Pix2pix, Age-cGAN, CycleGAN, Adversarial Autoencoders (AAE), Adversarially Learned Inference (AL), etc.

Subsequently, deep learning generative algorithms such as Transformer, flow-based models, and diffusion models emerged one after another.
Among them, the Transformer model is a deep learning model using a self-attention mechanism. This mechanism can assign different weights according to the importance of each part of the input data. It can be used in natural language processing (NLP), computer vision (CV ) field application.

The pre-training models such as BERT, GPT-3, and LaMDA that appeared later were all based on the Transformer model. The Diffusion Model is inspired by non-equilibrium thermodynamics, defines a Markov chain of diffusion steps, gradually adds random noise to the data, and then learns the inverse diffusion process to construct the required data samples from the noise.

Diffusion models were originally designed to remove noise in images. As denoising systems are trained longer and better, they can eventually generate realistic pictures from pure noise as the only input.

insert image description here

Second, the pre-training model has triggered a qualitative change in AIGC's technical capabilities. Although various generative models have emerged in an endless stream in the past, the high threshold for use, high training costs, simple content generation and low quality are far from meeting the flexible, high-precision, high-quality needs of real content consumption scenarios. The emergence of the pre-training model has triggered a qualitative change in AIGC's technical capabilities, and many of the above implementation problems have been resolved.

The third is that multi-modal technology promotes the content diversity of AIGC, which makes AIGC more versatile. The pre-training model is more versatile and has become a versatile and versatile Al model, mainly due to the use of multimodal technology, that is, multimodal representation of machine learning that integrates images, sounds, and languages.

In 2021, the OpenAI team will open source the cross-modal deep learning model CLIP (Contrastive Language-Image Pre-Training, hereinafter referred to as "CLIP"). The CLIP model can associate text and images, such as associating the text "dog" with images of dogs, and the associated features are very rich.

Therefore, the CLIP model has two advantages: on the one hand, it simultaneously performs natural language understanding and computer vision analysis to achieve image and text matching.
On the other hand, in order to have enough labeled "text-images" for training, the CLIP model widely uses pictures on the Internet. These pictures generally have various text descriptions and become natural training samples for CLIP. According to statistics, the CLIP model has collected more than 4 billion "text-image" training data on the Internet, which laid the foundation for the subsequent implementation of AIGC, especially the application of input text to generate images/videos.

2.2 The formation and development of the AIGC industrial ecology is accelerating, moving towards the future of Model as a Service (MaaS)

At present, the prototype of the AIGC industrial ecosystem has taken shape, presenting a three-tier structure of upper, middle and lower.

insert image description here

The first layer is the upstream base layer, which is the AIGC technical infrastructure layer built on the basis of the pre-trained model. Due to the high cost and technical investment of the pre-trained model, it has a high barrier to entry. Taking the GPT-3 model launched in 2020 as an example, Alchemy API founder Elliot Turner speculated that the cost of training GPT-3 may be close to 12 million US dollars. Therefore, the main institutions currently entering the pre-training model are leading technology companies and scientific research institutions.

The second layer is the middle layer, that is, vertical, scene-based, personalized models and application tools. The pre-trained large model is an industrial pipeline deployment in the basic field and functional scenarios, and has the advantages of on-demand use, high efficiency and economy. As the AlGC model with both large and multimodal models accelerates to become a new technology platform, Model-as-a-Service (MaaS) begins to become a reality, which is expected to have a huge impact on the commercial field. After Stable Diffusion was open-sourced, there have been many secondary developments based on open-source models, and vertical domain models for training specific styles have become popular, such as the famous Novel-AI generated by the two-dimensional painting style, and character generators of various styles.

The third layer is the application layer, that is, content generation services such as text, pictures, audio and video for C-end users. At the application layer, it focuses on meeting the needs of users, and seamlessly connects the AlGC model with the needs of users to achieve industrial landing. Take Stable Diffusion open source as an example. It not only opens up programs, but also its trained models. Successor entrepreneurs can better use this open source tool to dig out A richer content ecology plays a vital role in the popularity of AIGC among a wider range of C-end users. Now there are more and more tools for C-end users, including web pages, locally installed programs, mobile applets, group chat robots, etc., and even content consumption services that use AIGC tools to customize and generate maps.

Future market:
With the accumulation of labeled data, the improvement of technical architecture, and the increasing requirements of the content industry for richness/factuality/personalization, the AIGC industry will soon be pushed to the foreground.

In the next 2-3 years, AIGC's startups and business cases will continue to increase. At present, data generated by artificial intelligence accounts for less than 1% of all data. According to Gartner's forecast, by 2025, the proportion of data generated by artificial intelligence will reach 10%. According to an analysis by Generative AI: A Creative New World, AIGC has the potential to generate trillions of dollars in economic value.

The AIGC industry in my country has not yet developed and formed. At present, there are few AIGC representative companies, and there are still many deficiencies in the upstream.

Domestic AIGC scene development is less: In my country, due to insufficient technological development and the impact of the investment environment, AIGC is mostly used as part of the company's business, or even relatively marginalized functions for R&D and development, and the number of independent start-up companies is significantly less than that of foreign countries , There are less than 5 start-up players in most subdivided tracks, which indirectly leads to less development of domestic AIGC scenarios.

Insufficient depth of AIGC application scenarios: The tracks with the most layout in China are in the field of writing and speech synthesis, and the virtual human track has just started to rise and basically stays in the content field. However, overseas extension fields have been more fully explored, such as personalized text generation, synthetic data and other tracks are key areas of layout. This type of business expansion has high comprehensive requirements, requiring the client's degree of digitalization and a full understanding of the corresponding industry.

AIGC will be a productivity tool in the Web3 era.

3. AIGC technology

AIGC technology mainly involves two aspects: natural language processing NLP and AIGC generation algorithm.

3.1 Natural Language Processing Technology NLP

Natural language processing is a means to realize the interaction between humans and computers through natural language. The integration of linguistics, computer science, and mathematics enables computers to understand natural language, extract information and automatically translate, analyze, and process it.

Natural language processing technology can be divided into two core tasks:
Natural language understanding NLU: It is hoped that computers can have the same language understanding ability as humans. In the past,
computers could only process structured data. NLU enables computers to recognize and extract intent in language to
understand natural language. Due to the diversity, ambiguity, knowledge dependence and context of natural language,
computers have many difficulties in understanding, so NLU is still far behind human performance.
Natural language understanding is similar to the development history of the entire artificial intelligence, and has gone through three iterations: rule-based
methods, statistics-based methods, and deep learning-based methods.

insert image description here
Natural Language Generation (NLG): Transform data in non-linguistic formats into human-understandable language formats, such as articles,
reports, etc. The development of NLG has gone through three stages, from the early simple data merging to the template-driven mode to the
current advanced NLG, which enables computers to understand intent like humans, consider context, and present results
in a format that users can easily read and understand in the narrative. Natural language generation can be divided into the following six steps:
content determination, text structure, sentence aggregation, grammaticalization, reference expression generation, and language realization.

insert image description here

NLP is mainly applied in four aspects:

Sentiment analysis: There is a large amount of information on the Internet, and the expressed content is varied, but the expressed feelings
can be roughly divided into positive and negative, which can be used to quickly understand the user's public opinion.

Chatbots: The growth and popularity of smart homes in recent years has expanded the value of chatbots.

Speech recognition: WeChat can input by voice or directly convert voice into text, and car navigation can
directly say the destination, which greatly improves the convenience.

Machine translation: The accuracy of machine translation has been greatly improved in recent years, youtube and netflix can even do video
machine translation.

Commercially, NLP is mainly used in the following fields:
for handling financial, healthcare, retail, government and other departments handwritten or machine-created file
word processing, such as: name entity recognition (NER), classification, summarization and association extraction . This
automates the process of capturing, identifying and analyzing document information.
Semantic search and information extraction and knowledge graphs to build
interactive AI systems across retail, finance, tourism and other industry customers.

Neural networks, especially recurrent neural networks (RNNs), are at the heart of current major approaches to NLP. Among them, the Transformer model developed by Google in 2017 has gradually replaced RNN models such as long short-term memory (LSTM) as the preferred model for NLP problems. The parallelization advantage of Transformer allows it to be trained on larger datasets. This also contributed to the development of pre-training models such as BERT and GPT. These systems are trained using large corpora such as Wikipedia, Common Crawl, etc., and can be fine-tuned for specific tasks.

The Transformer model is a deep learning model that uses a self-attention mechanism that assigns different weights according to the importance of each part of the input data. In addition to NLP, it is also used in the field of computer vision. Like recurrent neural networks (RNNs), Transformer models are designed to process sequential input data such as natural language, and can be applied to tasks such as translation and text summarization. Unlike RNNs, Transformer models can process all input data at once. The attention mechanism can provide context for any position in the input sequence. If the input data is natural language, Transformer does not have to process only one word at a time like RNN. This architecture allows more parallel computing and thus reduces training time.

insert image description here

3.2 AIGC Generation Model

In recent years, the rapid development of AIGC is attributed to the accumulation of technologies in the field of generative algorithms, including: generative confrontation network (GAN), variable differential autoencoder (VAE), normalized flow model (NFs), autoregressive model (AR), Energy model and diffusion model (Diffusion Model). It can be seen that large models, big data, and large computing power are the future development trends.

Generative Adversarial Networks GAN (Generative Adversarial Networks)

In 2014, Ian J. Goodfellow proposed GAN, which is a deep neural network architecture
consisting of a generation network and a discriminative network. The generating network generates "false" data and tries to deceive the discriminative network; the discriminative network
verifies the authenticity of the generated data and tries to correctly identify all "false" data. In the process of training iterations,
the two networks continue to evolve and fight until they reach an equilibrium state, the discriminant network can no longer identify "fake" data, and the
training ends.

Diffusion Model Diffusion Model

Diffusion models are a new class of generative models that generate a variety of high-resolution images.
They have attracted a lot of attention after OpenAI, Nvidia and Google managed to train large models. Example architectures based on diffusion models
include GLIDE, DALLE-2, Imagen, and the fully open source stable diffusion. Diffusion models already hold
the potential to be representative of the next generation of image generation models. Taking DALL-E as an example, it can generate images directly through text descriptions
, allowing computers to have human creativity.

In addition to the natural language processing technology and AIGC generation algorithm model mentioned above, hardware such as supercomputers and computing power are also indispensable as infrastructure. In the process of machine learning, a large amount of training is required to achieve more accurate results. This kind of calculation cannot be completed by ordinary computers. At present, it is mainly completed by the computing cluster built by Nvidia A100, and domestic and foreign start-ups will also Made possible through the cloud.

4. What are the application values ​​of AIGC?

AIGC is expected to become a new engine for the innovative development of digital content.
1) AIGC can undertake basic mechanical labor such as information mining, material call, copy editing, etc. with superior manufacturing capabilities and knowledge levels than human beings, and from the technical level, it can meet massive personalized needs with low marginal cost and high efficiency.

2) AIGC can foster new formats and new models by supporting the multi-dimensional interaction and integration of digital content and other industries.

3) Help the development of the "Metaverse". Through AIGC, we can accelerate the reproduction of the physical world and create unlimited content, so as to achieve spontaneous and organic growth.

insert image description here

Application Scenarios:
1) AIGC+Media: Writing robot, interview assistant, video subtitle generation, voice broadcast, video collection, artificial intelligence synthesis anchor.

2) AIGC+ e-commerce: 3D models of products, virtual anchors, and virtual warehouses.

3) AIGC+ film and television: AI script creation, AI synthesis of human faces and voices, AI creation of characters and scenes, AI automatic generation of film and television trailers.

4) AIGC+ entertainment: AI face-changing applications (such as FaceAPP, ZAO), AI composition (such as Hatsune Miku virtual singer), AI synthesized audio and video animation.

5) AIGC+ education: AI synthesizes virtual teachers, AI makes historical figures based on textbooks, and AI converts 2D textbooks into 3D.

6) AIGC+Finance: Realize the automatic production of financial information and product introduction video content through AIGC, and shape virtual digital human customer service through AIGC.

7) AIGC+Medical; AIGC synthesizes language audio for the voiceless, synthesizes body projections for the disabled, and synthesizes medical companionship for patients with mental illness.

8) AIGC+Industry: Complete repetitive low-level tasks in engineering design through AIGC, generate derivative designs through AIGC, and provide inspiration for engineers.

5. Changes brought about by AIGC

1) The Mencius model developed by Lanzhou Technology has been implemented in the fields of marketing copy generation, literary auxiliary creation, research report generation, thesis writing assistance, digital face generation, news report writing, and intelligent customer service.
Using the Mencius model, it only takes a few seconds to generate a marketing copy, which costs about 2 yuan, and it takes about 60 yuan to write it completely manually. Zhou Ming said that the knowledge learned by the Mencius model far exceeds that of individuals, and the "written" copy has more advantages in terms of diversity and novelty. "On the whole, it is the general trend to use AI to assist creation, improve content production efficiency and reduce costs."

2) The performance of tasks such as natural language processing, speech recognition, and computer vision has been significantly improved. These technological changes have made AI more and more "smart" and "empathetic". After a lot of training, it can show creative abilities that exceed humans in many professional fields, and at the same time communicate smoothly with humans.

Standardized and formalized creations and occupations will be more replaced, and the importance of content and work with independent thinking and rich creativity will become more prominent.

3) AIGC application will improve production efficiency, accelerate content production and product development process; change the source of information acquisition, optimize user search experience; it will also reduce the production threshold of Internet content.

AIGC helps expand the imagination of artistic creation. Creators are influenced by their own habits, styles and preferences, and their imagination tends to be limited to a certain subspace; while artificial intelligence has no shackles and constraints, which can better stimulate artistic creativity.
insert image description here

6. Challenges faced by AIGC

While AIGC attracts global attention, intellectual property rights and technology ethics will face many challenges and risks. At the same time, there is still a big gap between AIGC and general artificial intelligence.

1) Intellectual property disputes. The rapid development and commercial application of AIGC not only has an impact on creators, but also has an impact on a large number of companies that rely on copyright as their main revenue.

2) Key technical difficulties. AIGC is still far from general artificial intelligence. Although the current popular AIGC systems can quickly generate images, these systems may not be able to truly understand the meaning of paintings, so they can reason and make decisions based on these meanings.

3) Create ethical issues. Some open-source AIGC projects have a low degree of supervision on the generated images. The data set system uses private user photos for AI training, and the phenomenon of infringing portrait images for training has been repeatedly banned. Some users use AIGC to generate prohibited images such as fake celebrity photos, and even create violent and sexual paintings. Since AI itself does not yet have the ability to judge value, some platforms have begun to restrict and intervene in ethical aspects, but relevant laws and regulations are still in a vacuum.

4) Environmental challenges. The AIGC based on the pre-training model requires a lot of computing power support not only for training but also for operation, which virtually increases energy consumption. Its rapid development has brought huge challenges to environmental protection and climate change, resulting in high carbon emissions.

5) Security challenges. Security issues are always unavoidable in the development and application of AI technology. Similarly, in terms of AIGC, there are also security challenges in many aspects such as content security, technology abuse, user privacy and identity, and AI endogenous security.
One is the content itself. For a long time, the Internet information space has been facing the challenge of false information and information content security. Internet content platforms at home and abroad, such as Facebook, Twitter, WeChat, Weibo, etc., are constantly improving their governance capabilities for false content and information security. But as the content of AIGC continues to grow, the challenges of false information and information content security will also increase.
The second is the malicious use or abuse of AIGC, leading to new types of illegal and criminal acts such as deep synthetic fraud, pornography, defamation, and fake identities. Criminals can use open source AIGC models or tools to produce rich types of false information such as audio, video, pictures, and text with a lower threshold and higher efficiency, and it is difficult to distinguish authenticity from false information, and it is also easier to steal User identity, in order to carry out new types of fraud and other illegal activities.
The third is user privacy and identity security. The AIGC model training data basically comes from the Internet, which may include personal privacy data, and the powerful reasoning ability of the pre-trained model may lead to the risk of personal privacy data leakage. Previously, as shown in the figure below, GPT-2 had a problem of privacy leakage. It can be seen that personal privacy data is included in the training data set of the model.
insert image description here

Fourth, AIGC's endogenous security challenges. In the MaaS industrial application mode, the endogenous security issues of the generated model, such as backdoor attacks, data poisoning, etc., and how to remove the poisonous data in the attacked model. At the same time, user data is usually submitted to the model service provider in plain text, and how to use existing encryption technology to protect user data privacy is also an important security challenge.

All sectors of society need to work together to deal with related issues and challenges in the field of AIGC, and strive to create a green, sustainable and environment-friendly AI model to achieve the integrated development of intelligence and low carbonization.

"The future has come, let us embrace AIGC, embrace the next era of artificial intelligence, and create a better future."

Guess you like

Origin blog.csdn.net/qq_41600018/article/details/129016473