Talk about AIGC from the perspective of cloud storage

As the saying goes: A glass of wine in the spring breeze of peaches and plums, ten years of night rain in the rivers and lakes.

In the past ten years, artificial intelligence has moved from the laboratory to industrial production, and has made great breakthroughs in the fields of speech recognition, text recognition, video recognition and other perception fields.

Now, if you haven’t heard of “Tongyi Thousand Questions” and “Tongyi Tingwu”, you will be embarrassed to say hello to others when you go out. Then, how has AIGC (AI Generated Content) represented by it repeatedly broken the circle with its strength, and what kind of fires and flowers have experienced its development?

Today, let's talk about it.

Before chatting, let’s insert an advertisement: "Computing Power Empowers AIGC Special Training Camp" is hotly opened. Alibaba Cloud technical experts use nanny-style teaching, using file storage NAS and machine learning platform PAI to build the hottest AIGC application at the moment. Click here >> , get involved now!

1. From imitation to creation, AIGC is "born stronger"

When it comes to AIGC, we have to talk about its corresponding PGC (Professional Generated Content) and UGC (User Generated Content). Both PGC and UGC are the main producers of content with people (the earliest concept of "people-oriented" can even be traced back to Guan Zhong more than 2,000 years ago), while AIGC produces content with AI as the core, so the two are significantly different.

AIGC technology mainly involves natural language processing NLP (including natural language understanding NLU and natural language generation NLG), AIGC generation algorithm, pre-training model, multi-modality, etc. These technologies essentially use AI algorithms to learn from a large number of training data sets, find the rules of existing data and draw inferences appropriately, so that AI can obtain intelligent digital content twinning capabilities, editing capabilities, and creative capabilities.

Traditional AI is biased towards analytical capabilities, and personalized recommendations are inseparable from it. But, now, times have changed and versions have changed. Compared with traditional AI, driven by the troika of algorithms, computing power, and data, AIGC is better than blue.

To put it simply, there are three major advantages:

the first is "creating something out of nothing". Let AI upgrade from perceiving and understanding the world to generating and creating the world. In the traditional model, AI is like a mechanical version of Wang Yuyan, paying attention to "every pen has a history", and the answers given are all from the database stored in advance. Generative AI creatively recreates information after receiving instruction information. For example, generative AI models can generate faces that do not exist in the real world based on real faces.

The second is "self-training". A cup of tea and a set of questions, concentrate on learning. The AIGC generation algorithm enables the machine to perform unsupervised pre-training on massive data without the need for the head teacher to follow, which greatly shortens the training time and is full of intelligence. Prior to this, model machine training (such as autonomous driving) relied heavily on manual data labeling. Once the scene was changed, it needed to be re-labeled, which required a lot of manpower and was inefficient.

The last is "going universal". A general artificial intelligence that can "comprehend by analogy" is the ultimate ideal of artificial intelligence. As the hope of the whole village, AIGC is gradually approaching this ideal. With the support of multimodal technology, the pre-training model develops into a full-modal general model spanning text, image, voice, and video. This pair of good CP works together, and the same AIGC model can produce various types of content with high quality.

2. "Three swords combined", driving AIGC to speed up in an all-round way

Before 2021, AIGC mainly generates text, and the existence of automatic writing artifacts is full. Today, the new generation of models can handle a variety of formats and content, whether it is text, images, code, audio or video, everything is easy. The "Generative Artificial Intelligence Service Management Measures (Draft for Comment)" issued by the Internet Information Office of the People's Republic of China recently clearly pointed out that generative artificial intelligence includes technologies that generate text, pictures, sounds, videos, codes, etc. based on algorithms, models, and rules. .

Behind AIGC's high-quality content output is inseparable from the maturity of large cross-modal pre-training models. This is because with the continuous improvement of the parameter scale and model performance, the large language model LLM (Large Language Model, generally more than 10 billion parameters) has shown good expansion in the fields of natural language processing, computer vision, and cross-modality. and continuously expand the application boundaries to continuously promote the application of AIGC.

The large model of Ali's "Tongyi Qianwen" is trained from a huge data set. You must know that data determines the performance, generalization ability, and application effect of machine learning algorithms; data acquisition, labeling, cleaning, and storage are also one of the bottlenecks of machine learning. Behind the powerful general language ability of "Tongyi Qianwen" is more than 10 trillion parameters. At the same time, Tongyi Qianwen also introduces knowledge graph technology to stratify, summarize, and correlate various types of knowledge, so as to give more accurate and comprehensive answers. Ali will open up the ability of Tongyi Qianwen to create its own exclusive GPT (a pre-trained language model) for each enterprise.

It should be noted that the reasoning and training of AI large models are highly dependent on GPU chips, just like Tang Seng is inseparable from the vanguard Sun Wukong. Lack of chips will lead to insufficient computing power, which means that huge models and data volumes cannot be processed. Therefore, there will be an IQ gap between the AI ​​models of different manufacturers. Some can speak well, while others are still babbling.

In addition, AIGC not only needs the combination of large models, big data, and high computing power, but also needs a stable, efficient, and secure digital infrastructure to support it in the entire process of generating, storing, and transmitting content. Avoid repeated construction and reduce the workload of data movement.

Cloud computing infrastructure (including high-performance chips, storage, computing, network, etc.) as the foundation of computing power has become increasingly important, and can provide sustainable development guarantee for AICG applications and industrial development. Therefore, many companies choose to complete the "AI alchemy" work of model development through the cloud to meet sudden computing power demands at a relatively low cost. Aliyun, which is actively welcoming the AIGC era, has also set up an "alchemy furnace" for customers.


3. Cloud storage "joins hands" with AIGC, with lower cost and higher performance

The research and development of AI large models often require more than 100 billion parameters, and its difficulty cannot be underestimated. At present, in the AI ​​large-scale model track, players from all walks of life, such as giants, returnees, start-up companies, and academics, compete for excellence. Foreign leading companies tend to focus on AIGC capabilities in general scenarios, while domestic AIGC applications are more scenario-focused. However, some players inevitably encountered some challenges in the process of developing AIGC business:

● The data runs through the entire AI training process, and there are isolated islands of storage, requiring multiple sets of storage systems, frequent relocation of data between multiple systems, and low storage efficiency ;
● The training model requires millions of pictures/text materials, and long-term storage of data brings high storage costs;
● In the scenario of large model training tasks, the computing power of hundreds or even thousands of GPU cards is often required, and there are many server nodes, The huge demand for cross-server communication makes the network bandwidth performance the bottleneck of the GPU cluster system.

In order to remove these "obstacles" and better develop the AIGC business, a mature solution is required to carry the massive data required for training and reasoning.

■ Low cost
Use Alibaba Cloud Object Storage OSS to build a unified data storage base, and the life cycle layering strategy reduces the storage cost of cold data. At the same time, it provides a transmission acceleration solution to reduce the waiting time of overseas users; around the peak and valley of business activities, the file storage NAS can be flexibly expanded and contracted to further save costs.

■ High performance
File storage CPFS not only provides up to hundreds of GB of access bandwidth, can meet the simultaneous access requirements of hundreds or thousands of nodes, but also supports data flow function to accelerate the data read and write performance in the training session. At the same time, CPFS cooperates with the PAI-Lingjun Smart Computing Cluster to achieve an acceleration effect of more than 3 times in model training, and further eliminates the bottleneck of performance expansion by virtue of the self-developed high-performance network technology stack; in the inference scenario, file storage NAS provides It achieves the standard file interface, multi-machine write and read consistency, and high aggregate throughput performance required for multi-machine GPU computing.

AI is like a knight-errant who inherits peerless martial arts and does not take an unusual path. He takes the upper-level route and always plays at the top of the rankings. For example, AlphaGo played against top chess players as soon as it came up, while AIGC is like a generation of masters who have accumulated a lot of money. , created his own unique mental method, and established a sect. Now, AIGC has become a battleground for heroes from all walks of life, and it is constantly opening new chapters in the fields of film and television, entertainment, and metaverse.

Click to try cloud products for free now to start the practical journey on the cloud!

Original link

This article is the original content of Alibaba Cloud and may not be reproduced without permission.

Guess you like

Origin blog.csdn.net/yunqiinsight/article/details/131638293