Big names explore opportunities in the AGI era, and Tencent Cloud helps speed up the large-scale application of large models

introduction

In 2023, the “top trend” in the technology world will be large models. Since the advent of ChatGPT kicked off the development of the large model and generative AI industry, domestic large models have quickly followed up, completing the stage leap from technology to product to business, and going deep into vertical industry fields.

The explosion of new technologies has spawned new application scenarios and product models, driving intelligent changes that affect the entire industry. Under the rolling trend, what opportunities and challenges will practitioners and entrepreneurs face, and how can we break through and usher in a new era of AGI?

Recently, the "Opportunities and Challenges of the Large Model Era" Tencent Cloud TVP AI Innovation Seminar was held in Shanghai Tengyun Building. Top experts in the AI ​​field were specially invited to conduct in-depth sharing and discussion around hot topics of large models, and jointly explore the era of large models. future direction.

Big model - technology, value, ecology

Teacher Zhang Jiaxing, a chair scientist at the Cognitive Computing and Natural Language Research Center of IDEA Research Institute and TVP of Tencent Cloud, shared the theme of "Large Model - Technology, Value, Ecology".

Talking about the birth of the GPT large model, Mr. Zhang Jiaxing started from more than ten years of senior research experience in the field of deep learning and used the four main lines of model structure, training technology, computing power + system and data to explain the trends behind the development of the entire technology, and Several key nodes are highlighted:

  • Model structure innovation: The rise of deep learning has promoted the innovation of model structure, in which the Transformer structure plays a key role. It breaks through the bottleneck of the model's 100 million parameters, unifies the attempts of various attention mechanisms, and also solves the problem of task design;
  • Breakthrough in training technology: The landmark event was the BERT model in 2018. Teacher Zhang Jiaxing believed that the model structure is the basis of physics, and training technology enables artificial intelligence to have specific capabilities;
  • Advances in computing power and data: The underlying chips continue to improve, with performance increasing by more than 100 times.

Teacher Zhang Jiaxing pointed out that any major change in technological paradigm is the disappearance of a type, or a process towards unification. Large models are such a new technological paradigm change. After the emergence of ChatGPT, the model structure became unified, and then "diverged" rapidly. The entire technical field was redivided, prompting the formation of a new production chain. This change marked that large models will become a new industry.

In the process of changing the paradigm of the entire technology, the direction of the model developed by the team led by Teacher Zhang Jiaxing is also changing, from the initial list of gods to the current large-scale expert model of Jiang Ziya series. Teacher Zhang Jiaxing analyzed that there are certain challenges in building a large model with full abilities. There may be conflicts and incompatibilities between different abilities. Therefore, each ability is split into independent models so that it can focus on the development of each ability. Achieve optimal performance for each ability by customizing targeted training strategies.

Teacher Zhang Jiaxing believes that in the competitive landscape of "Battle of Hundreds of Models", the exploratory nature of training technology is extremely important. He emphasized that training technology itself is a process of exploration. Explore good generation methods during training and guide model development with human feedback learning.

In terms of large model application products. Teacher Zhang Jiaxing proposed the idea of ​​layer-by-layer encapsulation from the expert model to the client:

The first level of packaging is integrated packaging: including code models and fine-tuning, applications and efficient reasoning tools, etc., and setting up various usage scenarios.

The second layer of packaging is the integrated packaging of models and computing power. Teacher Zhang Jiaxing is cooperating with Tencent Cloud in this regard and actively promotes the combination of models and computing power in a large model product to be provided to customers so that it can be used "out of the box" ".

Technological innovation paradigm and thinking in the AGI era

Mr. Li Jianzhong, chief technical expert of Boolan, chairman of the Global Machine Learning Technology Conference, and TVP of Tencent Cloud, gave a keynote speech entitled "Technological Innovation Paradigms and Thinking in the AGI Era".

Teacher Li Jianzhong first reviewed the timeline of technology development from an industrial perspective. He believed that both connectivity and computing have experienced revolutionary changes from 1.0 to 2.0. The 100 years from 1840 to 1940 was the era of connection 1.0. After the telegraph, the telephone, radio, and television were born one after another, which was the earliest connection technology. The first generation of computers appeared in 1946, followed by mainframes, minicomputers, microcomputers, and PCs. This was the 1.0 era of computing. Later, with the emergence of the Internet in 1995, Web2.0, mobile Internet, and cloud services came out. This is the era of Connection 2.0. Compared with the previous generation, connections have moved from one-way to two-way. With the emergence of the Transformer structure in 2017, the iteration of GPT was the Computing 2.0 era, which will continue. Teacher Li Jianzhong believes that according to the past technological development curve, this time will last until around 2035.

At the same time, teacher Li Jianzhong analyzed and pointed out that in the development process of technology, it has shown a "pendulum" state of connection and calculation. As for the relationship between the two, he believes that connection solves the relationship of production, while calculation solves the problem of productivity. The logic of the connection model is to provide information for users to make decisions, which is the natural soil for advertising; while the logic of the computing model is to require users to provide data to the machine to help make decisions, and its business model tends to charge fees. Under the calculation logic, efficiency is given priority and results are paramount.

Teacher Li Jianzhong proposed a "cube" model of paradigm shift. In this model, the 2.0; Z-axis represents media interaction, such as text, pictures, audio, video, three-dimensional, etc. He believes that the intersection of demand and technology is the key to innovation, while emphasizing the impact of media changes on products and innovation. In the era of intelligence, filling different quadrants represents different directions, such as combining large models with different fields to provide new ideas for innovation and product development.

Based on this, teacher Li Jianzhong summarized that large models have four core capabilities:

  • Generative model: It is its most mature and powerful part, capable of generating various content;
  • Knowledge abstraction: compress human knowledge and bring innovation to knowledge-intensive industries;
  • Language interaction: It is the core of human-computer dialogue and has huge room for imagination;
  • Logical reasoning: possess logic, planning, and memory abilities to become embodied intelligence.

What kind of innovation opportunities will be brought by combining the core capabilities of large models with different fields? Teacher Li Jianzhong proposed two main directions using the large model application layer as the starting point: AI-Native and AI-Copilot. AI-Native refers to new products or services that are fully integrated with AI, with high risks and high returns. AI-Copilot embeds AI capabilities into existing business closed loops in a progressively enhanced manner, and is compatible and expandable with existing infrastructure.

Similarly, in the software field, teacher Li Jianzhong shared three major paradigm shifts that large models bring to software development:

  • Development paradigm: Large models will change the way code is written, from engineers mainly writing code to AIGC mainly generating code;
  • Interaction paradigm: From graphical interactive interface (GUI) to natural language interactive interface (NUI), including NUI+GUI collaboration, changes in channel structured input intermediate links, and the removal of barriers between isolated applications to achieve seamless applications and services integrated;
  • Delivery paradigm: Users co-create malleable software, and this openness will enable the software to have a wider range of functions.

Teacher Li Jianzhong believes that in the next three to five years, the maturity of the entire AGI industry will reach a new height, bringing huge innovation opportunities.

Unlocking generative AI with ubiquitous hardware computing power and open software

Intel Academician, Global CTO of Big Data Technology, and Tencent Cloud TVP Teacher Dai Jinquan shared the theme of "Using Ubiquitous Hardware Computing Power and Open Software to Unlock Generative Artificial Intelligence".

Teacher Dai Jinquan first shared the work of the Intel team in the field of generative artificial intelligence. He mentioned that among the many factors that affect generative AI, computing power is a very important supporting factor. Intel has made targeted optimizations on how to improve the efficiency of the end-to-end AI pipeline and how to accelerate AI.

Through the combination of software and hardware, Intel has successfully improved the speed of AI deep learning and can even implement free software AI accelerators; in terms of generative AI computing acceleration, teacher Dai Jinquan mentioned that the data center side is the focus, and it will strongly support the training of large models. and very large-scale inference.

In Intel's recently released Gaudi2 deep learning accelerator, it cooperates with Hugging Face for model optimization. At the same time, Intel added Intel AMX to the server, which consists of two parts: one is a 2D register file, and the other is matrix acceleration support. Teacher Dai Jinquan mentioned that the advantage of this is the ability to achieve hardware acceleration on general-purpose CPU servers, which has certain significance in general-purpose computing scenarios.

In response to the industry demand for how to ensure the security of user data stored in the cloud and large models deployed privately and not leaked, teacher Dai Jinquan shared that through hardware protection and software security technology, full-link privacy protection can be achieved to ensure that data and models are used in computing. The process is invisible to other users, and calculations are only performed in a hardware-protected environment, which not only ensures security but is also close to the efficiency of plaintext calculations.

In order to realize the vision of ubiquitous AI, Intel recently open sourced a large model inference library based on INT4 on Intel CPU, which supports running large models with more than 10 billion parameters on Intel. Teacher Dai Jinquan introduced and demonstrated its functional features:

  • Supports INT3, INT4, NF4, INT8 and other technologies;
  • The technology is easy to use and migrate, can accelerate any large model based on PyTorch, and achieve efficient optimization;
  • Compatible with APIs commonly used in the community, existing applications can be migrated with one or two lines of code.

Finally, Teacher Dai Jinquan expressed his expectations for the future trend of large model applications seamlessly extending from PC to GPU to cloud. This new application scenario is worth exploring together.

For large models, how to build the most powerful computing cluster on the cloud

Teacher Qi Yuanjin, head of high-performance computing R&D at Tencent Cloud, shared the theme of "How to build the most powerful computing cluster on the cloud for large models."

First, Teacher Qi Yuanjin introduced deep learning and AI distributed training. He mentioned that in order to solve the problems of excessively large corpus data sets and sharp increase in model parameters in large model training, distributed computing needs to be used. In this regard, Teacher Qi Yuanjin shared some distributed computing solutions in current large model training:

  • Data parallelism: The data set of the model is divided and sent to each GPU for calculation. Each GPU calculates its own gradient, and then performs global synchronization to update the model parameters;
  • Model parallelism-pipeline parallelism: The model is divided according to the level of the model, and different parts will be allocated to different GPUs for calculation and gradient calculation and transmission;
  • Model Parallelism - Tensor Parallelism: Carry out more fine-grained segmentation of the model, and segment the parameter weights of the model horizontally or vertically;

In addition, there is expert parallelism, which is composed of individual expert systems and routed to different systems for calculation.

Teacher Qi Yuanjin mentioned that distributed computing can make full use of the computing resources of multiple GPUs, speed up training, and solve the problem of insufficient memory on a single GPU. Different methods are suitable for different scenarios and model structures, and choosing an appropriate parallel strategy can improve training efficiency and performance.

Distributed training methods have high requirements on network communication, and most of the industry adopts 3D parallel methods. Especially in 3D parallel scenarios, bandwidth requirements are sensitive to throughput. During training, in order to prevent the network from becoming a computing bottleneck, the communication bandwidth between machines needs to reach 1.6Tbps.

In order to meet the above challenges, Tencent Cloud has launched an AI computing power base - high-performance computing cluster HCC, which can be widely used in artificial intelligence model training scenarios such as large models, autonomous driving, business recommendation systems, and image recognition. It has the following characteristics and advantages:

  • Paired with high-performance GPU: Provides powerful computing power;
  • Low-latency RDMA network: node interconnection network is as low as 2us, and the bandwidth supports 1.6Tbps-3.2Tbps;
  • GpuDirect RDMA: GPU computing data does not need to be detoured, and is directly connected point-to-point across machines;
  • TACO Training Acceleration Kit: Improve artificial intelligence training performance with one click.

The H800 computing cluster launched by Tencent Cloud adopts a multi-track traffic architecture, which can greatly reduce unnecessary data transmission and improve network performance. It is in a leading position in the industry.

In addition to hardware support, Tencent Cloud also provides the self-developed collective communication library TCCL. Thanks to the self-developed switch architecture, TCCL realizes end-network collaboration, solves the problem of uneven traffic load, and can be improved in a dual-network port environment. The flow rate is about 40%. It also provides topology-aware affinity scheduling to minimize traffic detours. It has dynamic perception capabilities and can allocate tasks according to the optimal order to avoid communication data congestion.

Teacher Qi Yuanjin mentioned that Tencent Cloud’s solutions all adopt a dual uplink network design structure, which is more usable than single-port training. In terms of data storage, Tencent Cloud provides Turbo CF5 file storage solution and COS solution to improve data access performance through multi-level acceleration.

At the same time, in order to improve users' computing power usage, Tencent Cloud has launched the TACO Kit acceleration suite, which reduces the back-and-forth movement of data and speeds up parameter updates through unified management of memory and video memory; there is also TACO lnfer inference acceleration, which supports Inference is transparent and accelerated, giving users a better experience and service.

Teacher Qi Yuanjin concluded that Tencent Cloud's high-performance computing cluster HCC solution can help users complete each training task quickly and continuously from multiple levels such as data reading, training calculation, and network exchange, and provide users with complete cloud training. process support.

Discuss the debate session

After the theme sharing, the host Shen Xin, a technical expert at the Low-Code/No-Code Promotion Center of China Academy of Information and Communications Technology and Tencent Cloud TVP, made a wonderful summary. He mentioned that the core and critical impact of the development of large models is production Changes in relationships. For example, the question "Will programmers disappear?" can be compared to programmers who drove horses in the carriage era. There are still people who raise horses now, but they have been eliminated by people who drive. The software development industry will be reshaped by AI, which is the iteration and change challenge that future programmers will face.

Then, a sparkling discussion and debate session ushered in. The host, Ms. Shen Xin, proposed four in-depth open topics and two debate topics. The guests held a full discussion on each topic in a group format, and many wonderful viewpoints were generated during the lively exchanges and debates.

Topic 1: With the development of large models, what kind of AI ecosystem will be formed in the future, and how will it affect the pattern of the IT industry?

The speaker from the second group, Mr. Su Zhenwei, founder and chief architect of Shengpai Network and TVP of Tencent Cloud, proposed that AI will reshape the ecology and business model of the entire software industry in the future, including the current form of software applications and the mode of Internet operation. , user payment methods, etc. At the same time, as AI further promotes the development of productivity, it is foreseeable that enterprises' demand for personnel will change greatly in the future, and programmers will be reduced to a certain extent.

Teacher Su Zhenwei further concluded that AI will affect our future business and work in three major aspects: AI will promote changes in production efficiency and affect changes in productivity and production relations; the way of acquiring and using knowledge will change and improve efficiency; AI will become an asset As part of this, issues such as data rights confirmation deserve attention.

Topic 2: What are the differences and advantages between privatized deployment and cloud deployment of AI computing power, and which scenarios are each more suitable for?

The speaking representative of the third group, Meituan Financial Services Platform researcher and Tencent Cloud TVP teacher Ding Xuefeng, compared the privatized deployment and cloud deployment of AI computing power from three perspectives: cost, security and flexibility.

  • From a cost perspective: For small and medium-sized enterprises, cloud deployment is more in line with the current enterprise's needs for cost reduction and efficiency improvement in terms of hardware investment and maintenance;
  • From a security perspective: He believes that some industries, such as the financial industry, have extremely high security and compliance requirements, and privatized deployment is more suitable;
  • From a flexibility perspective: Public clouds can not only simply provide computing power on demand, but also provide one-stop solutions for mature scenarios. Users can choose the appropriate usage method based on actual needs to meet security and compliance scenarios. It is recommended to choose cloud deployment below.

Topic 3: How should enterprises measure the value of AI, how to quantify the cost structure and value, and what are the cases in different businesses?

The speaker from the fourth group, Mr. Xu Wei, Tencent Cloud TVP, proposed the following five evaluation dimensions: whether it creates value for the enterprise, saves costs, improves enterprise productivity, improves customer satisfaction, and assists business growth. Teacher Xu Wei added that different companies and industries face different challenges and goals, so evaluating the value of AI requires comprehensive consideration based on their specific circumstances and goals.

At the same time, in terms of ToB and ToC business scenarios, in the ToB field, intelligent customer service, digital people, AI knowledge bases and corporate training have been applied by many enterprises; in the ToC field, AI generation is currently the mainstream application scenario.

When talking about the cost structure of AI, Mr. Xu Wei believes that it mainly includes computing power costs, development and maintenance costs of AI technology, and operation and promotion costs of AI products.

Topic 4: With the craze of large models, what are the innovation opportunities that large companies and startups can tap into?

The speaking representative of the first group, Mr. Li Jianzhong, chief technical expert of Boolan, chairman of the Global Machine Learning Technology Conference, and TVP of Tencent Cloud, believed that from the perspective of data advantages, the current innovation in the AI ​​field is friendly to large companies or mature companies, but from the perspective of open source From a perspective, he thinks it is more friendly to startups.

Teacher Li Jianzhong elaborated on the product development model. The AI-Native model is more suitable for entrepreneurial companies, because they have a new starting point and thinking model when facing the arrival of new things, and the investment of some entrepreneurial companies is not weaker than that of large companies.

In the future, will open source or closed source be the mainstream for large models?

The speaker of the first group, Mr. Li Jianzhong, chief technical expert of Boolan, chairman of the Global Machine Learning Technology Conference, and Tencent Cloud TVP, is an "open source party". He first defined the term "mainstream": the most users are mainstream; he believes that with closed source In comparison, open source can achieve good standardization of the edge layer and model layer; at the same time, open source can gather the strength of the entire industry to optimize at one point, bringing more resources and investment.

Subsequently, the speaker from the second group, Mr. Su Zhenwei, the founder and chief architect of Shengpai Network and Tencent Cloud TVP, as a "closed source party", first refuted the definition of "mainstream". He believed that it can truly influence the promotion of the entire Industry changes, while forming a lasting cycle in business, and having a healthier ecology are the mainstream, and the closed-source ChatGPT4 was used as an example to demonstrate. He emphasized that large models include the model itself and data sources, so the open source of algorithms and results does not mean the open source of large models, and gave examples of various limitations of Lama2. Teacher Su Zhenwei believes that some of the current so-called open source frameworks are used as marketing tools and violate the true spirit of open source.

Later, Mr. Li Jianzhong from the "open source side" made a targeted rebuttal. He first corrected the other party's "open source marketing theory" and emphasized that open source is an ecological-level revolution. At the same time, in the case of ChatGPT4, he believes that its original source is open source from Google, and OpenAI is also preparing to be open source.

Teacher Su Zhenwei from the "closed source side" later added that he does not deny the ecological revolution of open source, but in fact many open sources are commercial behaviors to seize market share under the pressure of competition. At the same time, he said that sharing knowledge does not mean open source.

Are you more optimistic about the general large model track or the vertical large model track?

The speaking representative of the third group, Meituan Financial Services Platform Researcher and Tencent Cloud TVP Teacher Ding Xuefeng, is more optimistic about the general large model track. He believes that from a larger and higher historical perspective, the development of general large models is inevitable. Moreover, the limitations of large vertical models can be avoided at the application layer. At the same time, as the learning scope of general large models continues to expand in the future, all current vertical fields will be covered.

The representative of the fourth group who is more optimistic about the vertical large model track, Mr. Xu Wei, Tencent Cloud TVP, explained his views from three perspectives: From the perspective of business model, vertical large models have rich application scenarios and can be implemented in commercial applications. The model has been verified to be valid; from a cost perspective, the computing power cost of large models is extremely high, and the cost of vertical large models is more controllable; from a data perspective, as an extremely important part of large model training, general large models require The amount of data is huge, the data source is highly restrictive, and the vertical knowledge base is more achievable.

Later, Mr. Ding Xuefeng from the “General Large Model” party further discussed that the importance of general large models in the current AI field is self-evident. It provides a technical base and supports various applications; moreover, basic and universal Capability development is an inevitable requirement for autonomy and controllability.

Mr. Xu Wei from the "Vertical Large Model" side made the final addition. He believes that from the perspective of track ecology, the vertical large model track has more players, can better form an ecosystem where a hundred flowers bloom, and bring higher commercial value. and social value.

Conclusion

There is no definite answer to the discussion and debate topics in this seminar. The development of large models is in the ascendant and will bring new impacts to every technology practitioner, enterprise and industry. This event has come to a successful conclusion, but Tencent Cloud TVP experts will continue to explore technology. They uphold the original intention and vision of "influencing the world with technology" and continue to actively embrace the changes and trends in the large model era with an innovative heart. , meet future opportunities and challenges rationally with awe.

Highlights from the scene

Guess you like

Origin blog.csdn.net/QcloudCommunity/article/details/133321251