Special report on large models of artificial intelligence: in the ascendant and vying to be the first

Origin of the report: The trend of AI is clear, and large models are the key link.
The development trend of the global AI industry is clear, and it is moving forward amidst fluctuations. Since John McCarthy first proposed the concept of "artificial intelligence" in 1956, the global artificial intelligence industry has gradually entered a stage of academic research and industrial practice. Although the artificial intelligence industry has experienced a spiral development of "three ups and two downs" under the influence of multiple factors such as computing power performance, data volume, and hardware costs, the trend of global artificial intelligence development is still clear, and artificial general intelligence (AGI) is still It is the main line of development of the artificial intelligence industry.

ChatGPT accelerates the development of the AI ​​industry, and the explosion of demand forces supply innovation. ChatGPT has spread rapidly around the world and completed market education for enterprises and users in a short period of time. The explosion of demand has driven the supply side to accelerate innovation and accelerate the development of the global AI industry. IDC predicts that the global AI market is expected to reach US$308.28 billion in 2026, with a CAGR of approximately 26.7% from 2023 to 2026. As AI is an important area in the next generation of global technology competition, my country's active participation from top-level design to technology companies is expected to further drive the development of the domestic AI industry. IDC predicts that my country's AI market is expected to reach US$26.44 billion in 2026, with a CAGR of approximately 21.5% from 2023 to 2026.

With policy focus and support, artificial intelligence is expected to maintain high prosperity. Artificial intelligence is an important field that demonstrates the international competitiveness of major countries. In terms of top-level design, my country has always attached great importance to the encouragement and guidance of the development of the artificial intelligence industry. It has invested in key technologies, trained talents, empowered the real economy with artificial intelligence, and basic ethical requirements. Introduce relevant policies. With active policy support, my country's artificial intelligence industry is expected to maintain high prosperity.

On September 1, the Cyberspace Administration of China released the second batch of domestic deep synthesis service algorithm registration information, including 110 deep synthesis services such as Baidu Wenxin model, Douyin Skylark model, JD Yanxi model, and Tencent Hunyuan Assistant model. The algorithm passed the filing. As domestic large-scale models gradually open their services to the public, it is expected to accelerate the product implementation process and model iteration flywheel, and drive the commercialization of AI.

Industrial structure: Large models are an important part of the battle for entry into the AI ​​era. From the perspective of the AI ​​industry structure, large models are an important link between underlying computing power and upper-level applications. Mature large model capabilities and ecology are the basis for truly realizing general artificial intelligence and future application-side prosperity, and have stronger computing and reasoning capabilities. , Large model companies with higher versatility are expected to seize the traffic entrance and voice in the AI ​​era.

Competitive Situation: It will take time for the pattern to be clear, and we are optimistic about the leading advantages of Internet giants.
Development stage: A hundred schools of thought are contending with large domestic models, and it will take time for the pattern to be clear.

Global: China and the United States are leading the development, but the industry structure may be relatively independent. From a global perspective, China and the United States lead global development in the field of large models. Among them, based on its leading advantage in algorithm model research and development, the United States ranks first in the world in the number of large models. According to the "China Artificial Intelligence Large Model Map Research Report" jointly released by the China Institute of Scientific and Technological Information and the New Generation Artificial Intelligence Development Research Center of the Ministry of Science and Technology. , as of May 2023, the United States has released 100 large models with a parameter scale of more than 1 billion. China has also actively followed the global development trend of large models and accelerated output since 2021. For example, in June 2021, Beijing Zhiyuan Artificial Intelligence Research Institute released Wu Dao 2.0 with 1.75 trillion parameters, and in November 2021, Alibaba M6 large model parameters The amount reaches 10 trillion and so on. As of May 2023, my country has released 79 large models, occupying the first-mover advantage globally. However, taking into account factors such as data security, privacy compliance, and technology supervision, we believe that the large model markets in China and the United States are expected to form a relatively independent industry structure.

Overseas: The pattern of OpenAI and Google’s dual leaders + Meta’s open source catch-up + vertical specialty manufacturers has become relatively clear. Judging from the pattern of overseas large models, a relatively clear pattern of dual leaders leading + Meta open source catching up + vertical category prosperity has formed. At the same time, capabilities based on general large models have become relatively mature and available, and the application ecosystem on them has gradually prospered. Thanks to the integration of advanced algorithm models and early productization, OpenAI not only demonstrated GPT's beyond-expected performance in human-machine dialogue, but also the application ecosystem based on GPT has gradually prospered. Several Microsoft products (Bing, Windows Operating systems, Office, browsers, Power Platform, etc.), code hosting platform GitHub, AI marketing creative company Jasper, etc. have all connected to GPT. Google continues to invest in the field of artificial intelligence, and its proposed IeNet convolutional neural network model, Transformer language architecture, BERT large language model, etc. have all played an important role in promoting the global artificial intelligence industry. However, due to changes in the company's team and a more cautious attitude toward product implementation, Google has not yet launched large-scale C-side AI products in the early stage. Driven by the rapid popularity of ChatGPT, Google has also launched chat robots Bard and PaLM2, which will be connected to Google's collaboration and productivity tool Workspace and integrated with external applications such as Spotify, Walmart, and Uber Eats. Meta is quickly catching up through open source methods. In July, it released the latest open source large model LLaMA 2, which uses 2 trillion tokens for training and doubles the context length, achieving stronger performance capabilities and a wider range of application scenarios. In addition, Anthropic, Cohere, Hugging Face, etc. also play an important role in the overseas AI market based on their respective vertical characteristics and customized services.

Domestic: Investment in large models is in full swing, and it will still take time for the pattern to be clear. Since ChatGPT has received good user response and attracted attention around the world, China's leading technology companies (Alibaba, Baidu, Tencent, Huawei, Byte, etc.), emerging startups (Baichuan Intelligence, MiniMax, etc.), traditional AI companies (iFlytek, etc.) , SenseTime, etc.) and university research institutes (Fudan University, Chinese Academy of Sciences, etc.) are also accelerating investment in the field of large models. At present, domestic large models are still in the early stages of research and development and iteration, and the performance differences and ease of use of each large model are still in the process of market testing. We expect that it will still take some time for the competitive landscape in the domestic large model field to be clear.

Competition factors: technical investment, core talents and application scenarios constitute core barriers

Technical investment, core talents and application scenarios constitute core barriers. We believe that large-scale modeling is a track that emphasizes resource endowment and has high entry barriers. It has extremely high requirements for the effectiveness of algorithm models, high-quality data, and computing power support capabilities. The optimization and iteration of models also depends on funds. and continuous investment in talent. In addition, the actual implementation and industry application capabilities of large models are also important criteria for market testing.

Model architecture: Effectively separate theoretical innovation and engineering practice to accelerate the efficiency of AI technology innovation. The emergence of the transformer model in 2017 and the introduction of the self-attention mechanism have promoted AI to make rapid progress in language problems (NLP, etc.). At the same time, it has also expanded to image generation, audio and video generation, computer vision and other fields, and gradually Become the underlying foundation of many AI algorithms. The exponential growth of the input data scale of various algorithms and model parameters, as well as the better calculation accuracy and generalized problem solving capabilities brought by model scale, promote the rapid popularization of large models. According to Percy Liang, Rishi Bommasani, Li Feifei and others in 2021 It was mentioned in the paper "On the Opportunities and Risk of Foundation Models" in 2016 that large models with the characteristics of "expression ability, scalability, multi-modality, memory capacity and combination" will become the core direction of academic research and become the core direction of academic research. The underlying foundation model of the AI ​​industry. The success of ChatGPT shows that the effective combination of algorithm architecture + engineering practice, and the deployment of basic models after fine-tuning in various application scenarios, will significantly improve the efficiency of AI technology research and development and the threshold of industrialization. We judge that basic model theoretical innovation will gradually return to scientific research institutions, technology giants, etc., while the differentiated capabilities of many algorithm companies will further migrate to the field of engineering practice and become close partners of downstream application scenario manufacturers.

Large model training places extremely high demands on computing power and funds. We measure the single training cost of ChatGPT, assuming that the pre-training is single and there are no errors during the training process. In actual situations, taking into account the possibility of engineering errors during the training process, the actual cost will be higher than the ideal cost we calculated. Assuming that the parameter size is 175B and the training data is 500B Tokens, according to the analysis of "Scaling Laws for Neural Language Models" (Jared Kaplan, Sam McCandlish, Tom Henighan, etc.), 256 NVIDIA HGX A100 servers (including 2048 A100 In the case of GPU card), the model FLOPs Utilization (MFU) is assumed to be 51.04% of Megatron-LM. We estimate that the single training time is about 30.7 days, corresponding to about 1.51 million GPU hours. Assuming that the cost of training is approximately US$1/GPU hour, the server-side cost is approximately US$1.51 million.

Data: High-quality data can have a positive impact on the learning and iteration of the model itself, as well as on the training of reasoning capabilities. According to the current technical paradigm of LLM, the data set is mainly used in the pre-training and model tuning stages. The pre-training stage requires large-scale, multi-category, and high-quality training data. In the model tuning stage, vertical small data sets and prompt word engineering are equally important. In recent years, the amount of global data has shown explosive growth. According to IDC statistics, the amount of data generated globally in 2019 was 41ZB, and the CAGR in the past ten years was close to 50%. IDC predicts that the global data volume may be as high as 175ZB by 2025, and will still reach 175ZB from 2019 to 2025. It will maintain a compound growth rate of nearly 30%, and more than 80% of the data will be unstructured data such as text, images, audio and video, which is difficult to process. From Bert to GPT-3 and then to Google's PALM, public language data sources in the Internet have been utilized as much as possible (forums, news, Wikipedia, etc.), but model optimization still requires more data, which requires model development Business owners have access to high-quality private data sources so that they can gain a differentiated advantage in the data underlying the model.

Scenario: an important criterion for testing model capabilities when the application is implemented. On the one hand, the combination of general large models and industry scenarios can truly achieve universalization after bringing about substantial improvements in productivity and efficiency. On the other hand, after the large model is combined with industry applications, more real user feedback is expected to accelerate the optimization and iteration of the large model, thereby continuously strengthening the model capabilities.

Pattern deduction: It is expected to form a pattern dominated by Internet giants + coexistence of vertical manufacturers

Historical accumulation: Internet giants have first-mover advantages and have complete layouts at the chip layer, model layer, and application layer. Internet giants have been investing in the field of AI for a long time. Baidu established an artificial intelligence laboratory in 2014, and Alibaba, Tencent, and ByteDance also established artificial intelligence laboratories in 2016. Since then, each company has continued to explore at the chip layer, model layer and application layer. , constantly improving its layout, and has accumulated significant first-mover advantages in research and development, models, data, applications, etc.

Algorithm model: Following overseas technological progress, R&D breakthroughs are the key to competition. From a technical perspective, domestic large models mainly follow overseas progress. Based on Google's higher influence in the field of artificial intelligence and the BERT open source code, Chinese companies in the early stage of exploration in the field of large models will refer more to the BERT route. As ChatGPT's unexpected performance in the field of human-machine dialogue has verified the effectiveness of high-quality data + feedback incentives (large model pre-training + small data fine-tuning), the domestic large model technology route has gradually converged in the direction of GPT. We believe that although differences in model architecture design have a certain impact on the performance of specific tasks, the large domestic model manufacturers are basically technically homologous, resulting in relatively similar model capabilities at this stage, and the next stage will be in the direction of GPT. R&D breakthroughs will be the key to competition.

Computing power: Internet manufacturers have advantages in computing power resources. As model parameters and complexity increase, the demand for computing power for large models is also accelerating. Among the large models that have been released in China, there are only about 10 manufacturers with parameter scales of 100 billion or more, which reflects to a certain extent the differences in computing power between manufacturers. We believe that Internet manufacturers have a comparative advantage in computing power resources. The main reasons are: 1) Internet companies have diversified business layouts, huge user bases, and high-frequency updates of massive data, which makes Internet companies themselves have a large demand for computing power. Alibaba, Zi Leading Internet companies such as Jie, Baidu, and Tencent are important customers in the global chip and server fields. 2) Alibaba Cloud, Baidu Cloud, Tencent Cloud, etc. are the leading cloud manufacturers in China, and they are leading the deployment of new high-performance computing infrastructure such as cloud computing centers, AI computing power platforms, and supercomputing centers. For example, Alibaba Cloud launched PAI Lingjun Intelligence. Computing services provide a platform covering the entire process of AI development and distributed heterogeneous computing optimization capabilities; Tencent Cloud released a new generation of HCC (High-Performance Computing Cluster) high-performance computing cluster, with computing power performance up to 3 times higher than the previous generation.

Data: High-quality open source Chinese data sets are scarce, and self-owned data and processing capabilities constitute barriers to model training. Thanks to the Internet ecosystem created by open source, there are a large number of high-quality, structured open source databases overseas. The text sources include not only rigorous academic writing and encyclopedia knowledge, but also literary works, news media, social networking sites, popular content, etc., and more Rich corpus data can improve the model's conversational capabilities in different scenarios. However, due to the high cost of building data sets and the immature open source ecosystem, domestic open source data sets still have a large gap compared with overseas countries in terms of data scale and corpus quality. The data sources are relatively single and the update frequency is low, resulting in The training effect of the model is limited. Therefore, the self-owned data and processing capabilities of large model manufacturers form the core of the differentiation of model training effects. Benefiting from the massive users, applications and data accumulated in the mobile Internet era, Internet companies have become more distinctive and exclusive in their own data, and have more powerful data processing capabilities, which can bring about differences in model training results through data advantages. For example, when Alibaba was developing M6, it built M6-Corpus, the largest Chinese multi-modal pre-training data set, containing more than 1.9TB images and 292GB text, covering data sources such as encyclopedias, web crawlers, Q&A, forums, product descriptions, etc. And a comprehensive cleaning procedure is designed to ensure data quality. The training data set of Baidu ERNIE model also uses a large amount of ecological data such as Baidu Encyclopedia, Baidu Search and Baidu Knowledge Graph, ensuring the training effect of the model through higher quality data.

Resource investment: Internet manufacturers invest heavily in R&D, leading in terms of capital and talent. The training of large models requires high and sustainable R&D investment, and leading Internet companies have the advantages of both high capital density and high talent density. In terms of funding, in 2022, Tencent/Alibaba/Baidu's R&D expenses will reach 61.4/56.7/23.3 billion yuan, significantly ahead of related companies in the industry. In terms of talent, according to the Maimai talent pool, major Internet companies have the richest talent reserves in the four important technical directions of artificial intelligence, computer vision, deep learning, speech recognition, and natural language processing. Continuous high R&D investment and extremely high talent density are expected to drive leading Internet companies to maintain their leadership in the fields of AI and large models.

Scenario: The business is rich and diverse, and Internet manufacturers naturally have practical scenarios for implementation. Taking into account data privacy and security compliance, the initial general large model may face certain trust issues when it is implemented in the industry, resulting in higher customer acquisition costs. Based on its rich business accumulation in e-commerce, search, games, finance and other fields, leading Internet platforms naturally have practical scenarios for implementation. While improving product efficiency, it is also expected to take the lead in forming a demonstration effect, thus helping to expand external customers and applications.

Pattern deduction: Internet giants are expected to maintain their leading position, while small and medium-sized manufacturers may face path choices. Based on the above analysis, combined with industry competition factors, and with reference to the current competitive landscape overseas, we believe that the domestic large-model track is expected to form an industry trend similar to overseas, and leading Internet companies with advantages in technology, capital, talent and scenarios are expected to become An important player in the field of large models, while small and medium-sized manufacturers may face a path choice. On the one hand, small and medium-sized manufacturers can take advantage of their accumulated advantages in vertical scenarios and data levels to become core players focusing on vertical categories; on the other hand, based on the surge in computing power demand brought about by training and user calls, taking into account resource advantages and economy, small and medium-sized manufacturers may seek support and cooperation from cloud vendors.

Comparison of large Internet models: in the short term, technological breakthroughs, in the long term, ecological barriers.
Historical accumulation: Baidu and Alibaba have profound technological accumulation, and large models have achieved good performance.

In this chapter, we review the development status, self-research layout, and foreign investment of leading domestic Internet companies in the field of AI. Judging from the timeline, domestic leading Internet companies Alibaba, Baidu, Tencent, and Byte all established artificial intelligence laboratories between 2014 and 2016, but since then they have focused on their development ideas and framework layout. We believe that Alibaba and Baidu focus more on investment in underlying technology, with both first-mover advantages and complete self-research layouts. The large-model products currently launched have achieved good Chinese conversation capabilities. Tencent has also actively followed up in the field of AI and announced the latest progress of trillions of Chinese NLP pre-training model hybrids in December 2022; at the same time, Tencent also maintains an open investment style in the field of large models and grows together with enterprises. Byte's previous investment in the AI ​​field was more related to its own business, such as audio and video recognition, content creation, AR/VR, etc. It has outstanding algorithm capabilities, but its accumulation in the field of large models is relatively weak. It launched Volcano Ark and through MaaS Actively participate in industry competition.

Alibaba: It has invested in self-research on AI for a long time and has taken a leading position in the accumulation of data, algorithms and computing power. As a leading domestic technology company, Alibaba has long invested in cutting-edge technologies such as artificial intelligence. It established the Data Science and Technology Research Institute in 2014, the Artificial Intelligence Laboratory in 2016, the DAMO Academy in 2017, and the AI The chip self-research team serves as computing power support and has successively released the largest pre-trained language model PLUG and multi-modal large model M6 in the Chinese community. At the same time, Alibaba actively uses intelligent technologies such as deep learning in e-commerce, smart city and other businesses to improve business efficiency through technological progress. We believe that Alibaba has taken a leading position in the domestic artificial intelligence and large model competition based on its accumulation of data, algorithms and computing power.

External investment: Extensive layout to create an AI ecosystem to achieve win-win results. In addition to investing in self-research, Alibaba is also actively making external investments in the core AI industry. It invests in Cambrian and Shenjian Technology in the chip field, SenseTime Technology, Megvii Technology, etc. in the field of machine vision and deep learning, and invests in application fields. Xiaopeng, Xiaoi Robot, etc. Through its extensive layout in the AI ​​field, Alibaba is expected to form synergies and strategic cooperation with related companies, further achieve efficiency improvements and business expansion, and achieve win-win results by building an AI ecosystem.

Technical architecture: IaaS+PaaS+MaaS redefines AI architecture. Facing the new AI era, Alibaba Cloud has redefined the three-layer technology system of IaaS+PaaS+MaaS. At the IaaS layer, Alibaba Cloud has specially designed cloud infrastructure for AI, including heterogeneous computing, efficient and high-speed network storage, etc. At the same time, it provides Lingjun computing clusters for training and elastic computing ECS ​​clusters for reasoning, providing a solid foundation for AI development through more stable and efficient infrastructure. At the PaaS layer, Alibaba Cloud provides a wealth of big data and machine learning products based on long-term accumulation of technology and software capabilities, assisting model training in aspects such as data cleaning and feature engineering training. In addition, Alibaba Cloud launched the Magic Community in November 2022 and proposed MaaS (model as a service) to accelerate model development and iteration by creating a large model open source community and ecological co-construction.

Baidu: Ten years of hard work, self-research on the entire stack to build core barriers. Based on its own business needs and strong engineering culture, Baidu has always attached great importance to investment in AI. It opened an office in Silicon Valley in 2011 and proposed the "All in AI" corporate strategy in 2017. From the perspective of AI technology system, Baidu is one of the few domestic companies with a full-stack self-research layout in the AI ​​field. It has invested in self-research at the chip layer, framework layer, model layer and application layer, and has formed a certain industrial ecology and influence. force.

External investment: long-term investment, accelerating the layout of large model fields. Baidu has long been paying attention to the artificial intelligence track, insisting on deploying cutting-edge technology fields, and investing in chips, large models, AI + pharmaceuticals, applications and other fields. Since 2023, as ChatGPT has triggered a new round of AI industry boom, Baidu has also accelerated its layout in the fields of AIGC and multi-modal large models, and has successively invested in text-to-video generation technology and community start-ups Morph Studio and artificial intelligence companies Xihu Xinchen and multi-modal large model company Shengshu Technology are expected to further improve the company's AI ecological layout and accelerate collaborative development.

Chip + framework + model + application full-stack self-research layout to strengthen internal feedback iteration. Baidu has a full-stack self-research layout in the AI ​​field. At the chip level, Baidu has already achieved mass production of two generations of self-developed Kunlun cores, and it is expected that the third generation of Kunlun cores will be launched on a large scale in early 2024; at the framework level, Baidu Fei Paddle has been developed for 6 years After gradually maturing, it has become China's first open source, open, end-to-end deep learning platform with complete functions. As of November 2022, Baidu Fei Paddle has 5.35 million developers, served 200,000 enterprises and institutions, and created 67 Tens of thousands of models; model layer, Baidu first launched the Wenxin large model in 2019 and continuously iterated, and released the tens of billions of large models Wenxin ERNIE 3.0 and the hundreds of billions of large models Wenxin ERNIE 3.0 Titan in 2021; the application layer, Baidu launched the generative AI dialogue product Wenxin Yiyan and the Wenxin Qianfan large model platform for enterprise customers, actively verifying large model capabilities through practical scenarios. We believe that the advantage of Baidu's full-stack self-research layout is that feedback between each layer is expected to further drive the optimization of technical capabilities and improve iteration efficiency.

Tencent: attaches great importance to the development of AI, and drives both internal and external development. Tencent established the AI ​​Lab in 2016 and proposed the strategic vision of “make AI everywhere” in 2017. In 2018, it established two major laboratory matrices based on artificial intelligence and cutting-edge technology. According to the WeChat public account of Tencent Robotics The combination of content generation, life sciences, medical medicine, games and other industry applications. In terms of external investment, according to IT Juzi, as of the end of 2022, Tencent has invested in a total of 53 domestic AI companies, including multiple investments in AI computing power chip company Suiyuan Technology, enterprise-level cognitive intelligence service platform Mininglamp Technology, etc., and will invest in 2023 Shenyan Technology, MiniMax, Lightyear Beyond and other large model companies. In the field of large models, Tencent still maintains its investment style and is expected to share the fruits of growth with enterprises.

Tencent: From MaaS to the large model track, it has complete computing power support and application tools. On June 19, Tencent Cloud officially announced the research and development progress of industry large models for the first time, and released Tencent Cloud MaaS service solutions for B-end customers. Unlike Alibaba, Baidu, etc., which directly release large-model products, Tencent takes the lead in entering the large-model track with MaaS, providing more than 50 solutions in 10 major industries such as finance, cultural tourism, government affairs, media, and education to better understand industry and an easier-to-implement method to meet the needs of enterprises. At the same time, Tencent's TI platform provides a full set of tools for data annotation, training, testing, evaluation, and deployment, and the technical base provides computing support such as HCC high-performance computing clusters and vector databases to ensure the operation of large industry models.

Bytedance: A large model team will be established in 2023, led by the search and innovation departments. ByteDance established the Artificial Intelligence Laboratory in 2016, positioning it as an internal research institute and technical service provider within the company, providing AI technical support for the massive content output by the platform. Previously, the company's AI research results were mainly integrated with business, with R&D focusing on areas such as machine translation, intelligent speech, video images, and multi-modality, while the accumulation of large-scale models was relatively weak. According to 36 Krypton, ByteDance’s language large model team was established this year, led by the search department; the image large model team was led by the intelligent creation team under the Product R&D and Engineering Architecture Department.

ByteDance: Starting from MaaS, applying first to enrich industry experience. On June 28, Volcano Engine released the large model service platform Volcano Ark, which provides enterprises with comprehensive platform services such as model fine-tuning, evaluation, and inference. It has been connected to Baichuan Intelligent, Fudan University MOSS, Lanzhou Technology, MiniMax, and Zhipu AI. We have developed large-scale models from many AI technology companies and scientific research institutes, and have started invitation testing. We believe that due to the relatively weak accumulation in the field of large models in the early stage, it is a more feasible way to enter the large model track through MaaS. On the one hand, the MaaS model provides demanders with rich, diverse, flexible, and cost-effective ways to use large models; on the other hand, the implementation of industry applications and the accumulation of industry experience are also expected to feed back Byte's own accumulation and experience in the field of large models. Iterate.

Core talents: Pay attention to talent density and stickiness, and take into account basic R&D and business implementation.

In terms of talent, we believe that the differentiated competition of major Internet companies is mainly reflected in two aspects: 1) talent density and talent quality; 2) talent stickiness. The key to ensuring talent stickiness lies in the setting of organizational structure and incentive mechanisms. Since the R&D work of artificial intelligence has certain attributes of forward-looking research and academic influence, and in the context of improving the overall quality and efficiency of the Internet industry, there is also a certain demand for R&D output and business implementation. Therefore, how to balance basic research and business implementation? Relationships between companies and ensuring talent stickiness through a reasonable organizational system are the keys to the structure setting of major Internet companies.

Alibaba: DAMO Academy insists on cutting-edge exploration and high-density AI talents lead development. In terms of organizational structure, Alibaba’s research in the field of artificial intelligence is mainly led by DAMO Academy. Founded in 2017, DAMO Academy is committed to exploring the unknown of science and technology, and is driven by human vision to carry out basic science and innovative technology research. DAMO Academy mainly covers five major fields: machine intelligence, data computing, robotics, financial technology, and Five major laboratories: smart laboratory and urban brain laboratory. In terms of personnel composition, Alibaba's large model research and development team is led by Zhou Jingren, CTO of Alibaba Cloud Intelligence. He has rich experience in the field of big data platforms and artificial intelligence, and played an important role in the research and development of the M6 ​​series models. Huang Fei and Zhao Deli serve as heads of the Language Technology Laboratory and Basic Vision Laboratory respectively, leading research in the fields of NLP and CV respectively. Huang Fei has published more than 40 articles in top conferences and journals on natural language processing and artificial intelligence, and obtained a U.S. patent. More than 10 times, and has been engaged in natural language processing R&D and technical management positions at IBM and Facebook; Zhao Deli has worked in the Visual Computing Group of Microsoft Research Asia and the Multimedia Laboratory of the Chinese University of Hong Kong for six years, engaged in machine vision and machine learning algorithm research work. In addition, the Alibaba DAMO Academy team has recruited talented people. Ye Jieping, the former vice president and chief scientist of Shell Technology, and Bo Liefeng, the former chief scientist of JD Digital AI Laboratory, will join Alibaba in 2022, which is expected to further promote Alibaba's development in the fields of large models and AI. exploration.

Baidu: Pay equal attention to technical research and product acceptance, and CTO Wang Haifeng leads AI research and development. According to 36 Krypton, Baidu’s current Wen Xinyiyan team is mainly coordinated by two departments: TPG (Technology Middle Platform Group) and MEG (Mobile Ecosystem Group). The former is responsible for technical research, and the latter is responsible for search and content products. In terms of team members, Baidu CTO Wang Haifeng has been in charge of TG and AIG since the end of 2018. He is overall responsible for the research and development of Baidu artificial intelligence technology and basic technologies such as algorithms, computing power, data, and security, and serves as the general commander of the Wenxin Yiyan project.

Tencent: Multiple teams work in parallel, paying equal attention to basic research and business applications. Multiple teams within Tencent are engaged in artificial intelligence-related research and development work. Among them, AI Lab and Robotics X Laboratory are dual basic research departments, both affiliated with the Technology Engineering Division. AI Lab focuses on basic research in computer vision, speech recognition, natural language processing and machine learning, as well as application exploration in content, social networking, games and other directions. By the end of 2022, it has more than 100 top research scientists and more than 300 application developers engineer. At the same time, the Cloud and Smart Industry Business Group has established Tencent Youtu Lab to focus on in-depth research and application exploration of image technology; the WeChat Business Department has internally incubated the WeChat AI team. In February 2023, for ChatGPT-like conversational products, Tencent established the Hunyuan Assistant project team. Tencent chief scientist Zhang Zhengyou serves as the project owner, and Yu Dong, Wang Di, and Liu Tian serve as PMs respectively. There are at least 7 team leaders and 7 Sponsor.

ByteDance: Quickly assembled the team at the beginning of the year and collaborated with multiple departments on development. According to 36 Krypton, ByteDance’s language large model team was established this year, led by the search department; the image large model team was led by the intelligent creation team under the Product R&D and Engineering Architecture Department. Zhu Wenjia is the first person in charge of Byte Big Model and has profound experience in algorithms and search business. In addition, Data-AML head Xiang Liang, Artificial Intelligence Laboratory Director Li Hang, and former Alibaba M6 core technician Yang Hongxia are also important members of the team.

Technology investment: Baidu and Alibaba are temporarily in the first echelon, Tencent and Byte are accelerating to catch up, focusing on iteration efficiency

In this chapter, we conduct a comparative study on domestic leading Internet models from a technical perspective. In the early stage, based on similar algorithm routes, architecture designs and training corpora, large Internet models have not yet shown significant differences in capabilities. According to IDC, Alibaba’s Tongyi Qianwen and Baidu’s Wenxinyiyan achieved similar scores in terms of algorithm model, general capabilities, and innovation capabilities. Looking forward to the future, we believe that the key points of technical competition for large Internet models are: 1) R&D breakthroughs in key GPT technologies; 2) Cost and efficiency advantages under similar performance; 3) Construction of large-scale, high-quality training predictions.

Algorithm model: Previously, major model architectures and routes mainly referred to open source models such as BERT and LLaMA. The technical routes are basically the same, but each has its own focus on model design and training methods. For example, Alibaba places more emphasis on multi-modal task capabilities and efficiency. Baidu focuses on improving NLP capabilities, while Tencent focuses on both model scale growth and efficiency improvement. As ChatGPT has verified the effectiveness of the GPT route and high-quality data + feedback incentives, it is driving the technical route of large models to converge in the direction of GPT. We believe that the core of subsequent differentiated competition among major Internet companies in algorithm models lies in: 1) R&D breakthroughs in key GPT technologies; 2) If it is difficult to achieve breakthroughs in model R&D, it will be achieved through the optimization of model design and training methods Manufacturers with better cost and efficiency under similar performance are expected to have greater competitive advantages.

Computing power: With the surge in large model parameters and data volume resulting in rapid growth in demand for computing power, major Internet companies are accelerating the construction of new computing infrastructure such as AI computing power platforms and supercomputing centers. Based on the sufficient computing power reserves of major Internet companies and their active construction of new computing infrastructure, we believe that short-term computing power may not become a bottleneck restricting the development of large-scale Internet models. Companies with self-research capabilities for chips in the medium and long term are expected to have greater capabilities. Strong competitive advantage.

Data: High-quality data sources and data processing capabilities are the core of differentiated competition. When Alibaba trained M6 and Baidu trained ERNIE 3.0, they both built large-scale databases at the terabyte level. The data sources contained a large amount of unique data within the ecosystem, and designed a complete cleaning program to ensure data quality, effectively improving the Model training effect and conversational performance in Chinese context.

Algorithm model: The underlying route gradually converges in the direction of GPT, and the model design and training methods have different focuses.

Alibaba: unified learning paradigm + modular design, Tongyi creates a multi-modal unified base. DAMO Academy believes that an all-round model should have three attributes: ①Task-Agnostic: not targeted at specific downstream tasks, but a more general model. ②Modality-Agnostic: Establish a unified input and output form for all tasks to achieve processing capabilities in different modalities. ③Task Comprehensiveness: It is necessary to design a sufficiently rich variety of tasks to ensure the robustness of the model. In order to create a large universal model for multi-modal and full tasks, DAMO Academy uses a unified learning paradigm and modular design to enable M6-OFA to handle more than 30 cross-modal tasks, and at the same time, it can flexibly call modules to achieve high efficiency. and high performance.

M6-OFA achieves unification of architecture, modality and tasks. ① Unified architecture: M6-OFA uses a unified Transformer Encoder-Decoder+ResNet Blocks architecture for pre-training and fine-tuning, eliminating the need to design specific model layers for different tasks. ② Modal unification: M6-OFA unifies NLP, CV and multi-modal tasks into the same framework and training paradigm, so that the output of different tasks can be completed. ③Task unification: M6-OFA uniformly models all tasks involving multi-modality and single-modality into sequence-to-sequence (seq2seq) tasks. The model can learn multiple tasks at the same time, allowing the model to obtain text through one pre-training. Generation, image generation, cross-modal understanding and other capabilities.

Drawing on the operating mechanism of the human brain, the modular design improves multi-modal task capabilities and efficiency. Modular design draws on the operating mechanism of the human brain, that is, the human brain has modules with the ability to store various knowledge and process different modal information. When humans think, they only call modules related to specific tasks, thus ensuring the high-speed operation of the human brain. The large modular model uses a modular Transformer Encoder-Decoder structure to unify multi-modal understanding and generation, and at the same time divides into different independent modules, including basic layer, general layer (such as different modalities), task layer and functionality. Modules (such as inference), each module is decoupled from each other and performs its own duties, so that the large model can be lightweight and the task level can be improved by flexibly splitting different modules for fine-tuning or pre-training.

Tongyi-M6: Rapid model iteration, 2 years of investment to launch the world's largest pre-trained model. DAMO Academy officially launched the Chinese multi-modal pre-training model M6 project in January 2020. Since then, the model has been rapidly iteratively upgraded, with the model parameter scale reaching 100 billion/trillion in March/May 2021. In November 2021, the model parameter scale reached 10 trillion, becoming the world's largest pre-training model. The MoE model was built on Alibaba Cloud PAI's self-developed Whale framework, and more fine-grained CPU offload technology was superimposed to achieve 100,000 completions using only 512 GPUs. Training of hundreds of millions of parameters; at the same time, the M6 ​​team designed a Pseudo-to-Real (shared release) mechanism to greatly increase the training speed. In September 2022, DAMO Academy released the Tongyi large model series, creating the industry's first AI unified base, and announced that the relevant core models will be open source to developers around the world.

Tongyi-AliceMind: Deep language model system continues to be enriched, and NLP (natural language processing) capabilities are outstanding. After three years of research and development, the deep language model system AliceMind currently includes general language model StructBERT, multi-language VECO, generative PALM, multi-modal StructVBERT, structured StructuralLM, knowledge-driven LatticeBERT, machine reading comprehension UED, ultra-large model PLUG, etc. AliceMind has successively topped authoritative lists in the field of natural language processing such as GLUE, CLUE, XTREME, VQA Challenge, DocVQA, MS MARCO, etc., and has outstanding capabilities in the fields of multi-language, generative, multi-modal, structured, knowledge-driven and other fields.

General meaning-Visual large model: focusing on application implementation in the field of CV (computer vision). The Tongyi Vision large model is based on two basic models: text-to-visual generation and feature-to-visual generation. Through the support of middle-level general algorithms such as video processing, visual question answering, visual arithmetic, and knowledge extraction, it can be implemented in e-commerce, urban brain, industrial vision, etc. industrial applications in the field. For example, the Tongyi-Visual large model can realize scene applications such as image search and everything recognition in the e-commerce industry, and play a role in Vincentian graphics, transportation and autonomous driving fields.

Model Ecology: MaaS pioneer, the Magic Community quickly iterates. Alibaba Cloud proposed MaaS in November 2022 and launched the open source community Magic Build. On the one hand, it reduces the threshold for AI use by providing a one-stop usage platform with models as the core elements; on the other hand, it attracts more developers through open source Participants co-create and build to accelerate model development iterations. The Moda community has developed rapidly over several months. According to Alibaba's financial report, as of July 2023, the total number of models in the Moda community has exceeded 1,000, and the cumulative number of model downloads has exceeded 45 million times. At the same time, core models and capabilities in the Tongyi large model series such as the large language model AliceMind-PLUG, the unified multimodal understanding and generation model AliceMind-mPLUG, the multimodal unified base model M6-OFA, and the S4 framework, the key technology for the implementation of ultra-large models, are also It has been open sourced to global developers in the Magic Community. We believe that as a pioneer of MaaS in China, MoTao’s faster iteration speed and richer application feedback brought by the open source community are expected to give MoTao an advantage in the mid- to long-term model ecological construction.

Baidu: ERNIE series models continue to iterate and continue to break through NLP task performance

ERNIE 1.0: Add phrase and entity masking strategies to strengthen model knowledge reasoning capabilities. Based on the BERT model, ERNIE 1.0 mainly improves the masking strategy. Unlike BERT, which uses basic mask types, ERNIE 1.0 adds phrase mask and entity mask types, allowing the model to demonstrate stronger grammar learning and knowledge reasoning capabilities. ERNIE performs better than the baseline model BERT on five types of natural language processing tasks (natural language reasoning, semantic similarity, named entity recognition, sentiment analysis and retrieval question answering). At the same time, in terms of corpus, in addition to using Chinese Wikipedia, pre-training also uses a large amount of data from Baidu Encyclopedia, Baidu News and Baidu Tieba. Richer training data improves the model's ability to understand Chinese semantics. In addition, ERNIE models the query-response dialogue structure on the DLM (dialogue language model) task, and helps ERNIE learn the implicit relationships in the dialogue through multi-turn dialogue, thus enhancing the semantic representation ability of the model learning.

ERNIE 2.0: Improved multi-task learning methods to achieve SOTA performance on multiple NLP downstream tasks. Multi-task learning usually has two methods: simultaneous learning and sequential learning. The simultaneous learning mode cannot ensure that the continuous increase of tasks can bring about the continuous improvement of the model effect. In the sequential learning mode, as different tasks are learned and the model parameters progress, It may cause the model to fall into oblivion. ERNIE 2.0 adopts an alternating multi-task learning method. When a new task appears, the previously learned parameters are used to initialize the model and train the newly introduced tasks and the original tasks at the same time, thus effectively alleviating the forgetting phenomenon and improving the efficiency of model training. effectiveness. With the optimization of multi-task learning methods, ERNIE 2.0 has achieved SOTA (state of the arts) performance in Chinese and English on multiple NLP downstream tasks.

ERNIE 3.0: Introducing large-scale knowledge graphs to improve the model’s knowledge memory and reasoning capabilities. Since large models have been trained using plain text without introducing knowledge, and the traditional autoregressive fine-tuning method shows relatively weak performance when solving downstream language understanding tasks. Therefore, ERNIE 3.0 proposes a unified framework for large-scale knowledge enhancement model pre-training. By introducing a large-scale knowledge graph with 4TB corpus and 10B parameters, the model is pre-trained on a large-scale unsupervised corpus. At the same time, ERNIE 3.0 uses a variety of pre-training tasks such as word awareness, structure awareness, and knowledge awareness, so that the model can learn different levels of knowledge more effectively. With the above improvements, ERNIE 3.0 achieves SOTA performance in 54 Chinese NLP tasks such as sentiment analysis, opinion extraction, reading comprehension, text summarization, dialogue generation, and number operations.

ERNIE 3.0 Titan: Enhances controllability and credibility, achieving the strongest performance in Chinese pre-training. ERNIE 3.0 Titan retains the parallel pre-training algorithm of ERNIE 3.0's massive unsupervised text and large-scale knowledge graph, and further designs a controllable and trustworthy learning algorithm through self-supervised adversarial loss and controllable language modeling. loss, achieving different types of zero-sample generation capabilities and significantly improving the credibility of the generated results, and achieving SOTA performance in 68 NLP tasks such as text classification, information extraction, and reading comprehension.

The multi-modal large model layout is complete, and the Wenxin series models meet diverse needs. In addition to continuing to upgrade the ERNIE series models in the NLP field, Baidu is also actively deploying in the fields of vision, cross-modal and biological computing. In the visual field, based on leading visual technology, it utilizes massive images, videos and other data to provide visual basic models and visual task customization and application capabilities; in the cross-modal field, based on key technologies of knowledge-enhanced cross-modal semantic understanding, it achieves cross-modal Quickly build applications such as retrieval, image and text generation, and information extraction from picture documents; in the field of biological computing, integrate the characteristics of biological research objects into the model to build a pre-training model for the biological computing field oriented to compound molecules and protein molecules. Based on the perfect layout of multi-modal large models, Wenxin series models can meet the diverse needs of various industries in different fields.

Tencent: Focusing on improving efficiency, Hunyuan has become the first low-cost, implementable NLP trillion model in China. The Hunyuan AI large model integrates CV, NLP and multi-modal understanding capabilities. In April 2022, it released its research and development progress for the first time and topped the list of five authoritative data sets including MSR-VTT and MSVD. In December 2022, Hunyuan researched and optimized hot start and course learning, MoE routing algorithm, model structure, training acceleration, etc., significantly reducing the training cost of trillions of large models, becoming the first low-cost, implementable NLP in China Trillions of large models, and once again topped the list of natural language understanding tasks CLUE.

Hot start and course learning: Hunyuan first trains and converges on a small-scale model, and then transfers the knowledge of the small model to the large model, gradually increasing the size of the model, so that when the model scale grows exponentially, only fewer iterations are needed can reach a better level.

MoE routing algorithm: Unlike the Dense model, which activates all FFN and SA layer parameters during the training process, resulting in higher training costs, MoE introduces routing and only activates some FFN parameters to participate in the calculation, thus enabling training costs to be reduced. save. At the same time, under the same scale, large models using MoE can have higher training and inference efficiency.

Attention weight replication: The Hunyuan research team found that the attention weights did not differ much between different layers, so they improved the way the attention weights are set. At each layer, there is a random probability of p to recalculate the attention weight, and a probability of 1-p to reuse the attention weight of the previous layer. Through experiments, it was found that when p is set to 50%, the model effect is lossless, the total time complexity of attention weight is reduced by 50%, and large model pre-training is accelerated by about 20%.

Word vector routing mechanism: Introducing additional word vectors for expert routing, decoupling the routing and attention layer output, the routing vectors of the same words are the same, which accelerates convergence while ensuring routing stability.

Computing power: Actively deploy new high-performance computing infrastructure to ensure computing power support

Alibaba: PAI×Lingjun Intelligent Computing supports the development of large models with 10 trillion parameters. Based on Lingjun Intelligent Computing, Alibaba Cloud launched the PAI It can reach the scale of 10,000 calorie, improve the training performance by nearly 10 times, and the linear expansion efficiency of kilocalorie scale reaches 92%, which can deeply support the research and development of general large-scale models.

Baidu: Baidu Baige builds AI-native intelligent computing infrastructure to achieve leading performance under the same configuration. In September 2022, Baidu Cloud upgraded and released Baidu Baige 2.0, which enhanced capabilities and enriched functions in AI computing, AI storage, AI containers and other modules, and released a new AI acceleration suite. ①AI computing: Released the elastic RDMA network card, which is integrated with the VPC network, making the user's cost of use lower, and the communication delay is reduced by 2-3 times compared with the traditional TCP network. ②AI storage: Released a bare metal version of parallel file storage PFS, supporting IB network, which can reduce the computing access delay to data to the hundred us level. At the same time, the object storage BOS has added a native level namespace, which can increase metadata access speed by more than 4 times. ③AI container: It is the first in the industry to launch a dual-engine GPU container virtualization solution, which can meet the requirements of various scenarios and improve GPU resource utilization. ④AI acceleration suite: Through the integration of storage, training and push, it accelerates data reading, query, training and reasoning, further improving the speed of AI operations. In the MLPerf Training v2.0 list, the BERT Large model GPU training performance results jointly submitted by Baidu Baige 2.0 and Baidu Feipiao ranked first under the same GPU configuration, 5%-11% faster than other results.

Tencent: The computing power of the latest HCC high-performance computing cluster has been increased by 3 times, and the training of a trillion-sized model can be completed in 4 days. In April 2023, Tencent Cloud launched a new HCC high-performance computing cluster, using the latest generation of Tencent Cloud Xinghai self-developed servers and equipped with NVIDIA H800 Tensor Core GPU, which can provide the industry's highest 3.2T ultra-high interconnect bandwidth. Compared with the 1.6T network, the overall computing power of the cluster is increased by 20%, shortening the training time of a trillion-parameter Hunyuan NLP large model to 4 days, greatly improving the training efficiency of large models.

ByteDance: Self-developed DPU and other series of cloud products to improve the efficiency of large model training. On April 18, Volcano Engine released a series of cloud products such as its self-developed DPU, and launched an intelligent recommendation-high-speed training engine, which adopts the integration of software and hardware, fine-grained operator optimization, and distributed training and reasoning to achieve faster training speed and Lower training costs. According to the Volcano Engine WeChat public account:

Integration of software and hardware: For ultra-large models in key scenarios, the Volcano Engine intelligently recommends the high-speed training engine to provide a full GPU solution, which can support high-speed training of ultra-large models of 100GB-10TB. The comprehensive ROI is 5 times that of the CPU; covering more scenarios The model provides a GPU+CPU mixed training solution, and the comprehensive ROI is 2 times that of the CPU.

Fine-grained operator optimization: Optimize fine-grained operators for search, recommendation and marketing scenarios to achieve better performance during model inference. During training, through operator fusion and fine tuning, performance is improved by 20%; during inference, through operator optimization, performance is improved by 40%.

Distributed training and reasoning: Intelligent recommendation-high-speed training engine supports all-round fault tolerance for training and reasoning in order to ensure the stability of the system. When a node fails, it can quickly recover; supports distributed reasoning, including multi-sharding and multi-replicas. , thereby ensuring the high availability of online services. For businesses such as Douyin and Toutiao based on the Volcano Engine high-speed training engine, the model training time is 10 to 25 times faster than before, and the overall cost is reduced by 25% to 67%.

Data: High-quality data sources and data processing capabilities are the core of competitive differentiation. Since there are few high-quality Chinese open source databases and the data scale is small, high-quality data sources and data processing capabilities are the core of competitive differentiation. When Alibaba trained M6 and Baidu used ERNIE 3.0, they both built their own large-scale TB-level databases. The data sources contain a large amount of unique data within the ecosystem, and they have designed a complete cleaning program to ensure data quality and effectively improve the model. training effect and conversational performance in Chinese context.

Funding: Major Internet companies all attach great importance to R&D investment. Baidu has invested more than 100 billion yuan in AI in 10 years. Major Internet companies all have stable cash flow and attach great importance to R&D investment. There is no significant difference in financial strength. However, under the overall industry trend of cost reduction and efficiency improvement, they may pay more attention to R&D investment efficiency and output. In 2022, Tencent/Alibaba/Baidu's R&D expenses will be 614/567/23.3 billion yuan respectively, and the R&D expense rates will be 11.1%/6.5%/18.9% respectively. They have repeatedly emphasized R&D investment in the field of artificial intelligence. According to Alibaba's financial report, in FY2022, Alibaba's technology investment exceeded 120 billion yuan; in the past three years, 60% of Alibaba's patent investment has been concentrated in hard-core technology fields such as cloud computing, artificial intelligence, and chips. According to Robin Li's speech at the 2022 World Artificial Intelligence Conference, Baidu has invested more than 100 billion yuan in the field of artificial intelligence in the past 10 years, of which core R&D investment has accounted for more than 20% of core revenue for many consecutive quarters. At the same time, Baidu continues to provide funds and resources in terms of free computing power and AI talent training.

Application scenarios: Internal core business is the first to be implemented, and industry scenarios are actively explored

Take the lead in applying its own core business and B-side, and pay attention to the actual implementation progress. From a scenario perspective, each company took the lead in applying large model capabilities to its core business. While improving business efficiency, it also created benchmark cases for industry applications. In terms of external applications, since the ability of large models to improve quality and efficiency naturally meets the needs of the B-side, the implementation of the B-side is currently progressing relatively quickly. We believe that AI is expected to have greater application space in industries with a high degree of digitalization or labor-intensive industries. According to IDC data, my country's professional services, government, manufacturing, banking, and communications are expected to become the largest industries in the AI ​​application market, with the market size expected to reach US$77.4/36.9/28.0/20.6/1.85 billion in 2026. In terms of industry coverage, each company shows its strong business attributes and resource accumulation in the previous industrial Internet field. Referring to the customer composition of each company in the cloud computing market, we believe that with the continuous iterative feedback of industry applications-data-models, each company is also expected to form a comparative advantage in specific industry tracks in the field of large models. On the C-side, although there are no hit apps yet, with reference to the higher market value growth achieved by application companies in the mobile Internet era, we believe that AI’s disruptive innovation in C-side applications is also expected to bring about an explosion of industrial value in the future.

Alibaba: All product lines are connected, and it is expected to take the lead in forming a demonstration effect. AI has brought significant improvements in production efficiency and has been widely used in fields such as text induction and generation, creative content generation, and code development. At the same time, after the large model is combined with industry applications, more real user feedback is expected to accelerate the optimization and iteration of the large model, thus strengthening the virtuous cycle. According to the 2023 Alibaba Cloud Summit, all Alibaba products will be fully upgraded with access to large models in the future. While improving product efficiency, it is also expected to take the lead in forming a demonstration effect, thus helping to expand external customers and applications.

Office: DingTalk is fully connected to Tongyi Qianwen to realize intelligent production. In April, DingTalk President Ye Jun announced at the 2023 Spring Ding Summit that DingTalk will be fully integrated into Alibaba’s Tongyi Qianwen model. When using DingTalk, users can call the Tongyi large model capability through the slash "/" to greatly improve the collaboration efficiency in scenarios such as group chat work discussions, tweet creation, video conferencing, event planning, and data management. We believe that text work and content creativity in office scenarios are naturally suitable for productivity transformation through AI.

Office: Tongyi Tingwu is newly launched to comprehensively improve the efficiency of converting audio and video into graphics and text. On June 1, Alibaba Cloud released Tongyi Listening, a new AI product focusing on audio and video, becoming the first large-scale application product in China open to public testing. Tongyi Tingwu is connected to the understanding and summarization capabilities of the Tongyi Qianwen large model, helping users to complete the transcribing, retrieval, summarizing and organizing of audio and video content during work and study. At the same time, Tongyi Listening can also be embedded into various audio and video platforms to form real-time subtitles, intelligent summaries, etc. For example, DingTalk’s “Ding Flash Notes” integrates Tongyi Listening, and in the future Tongyi Listening is also expected to be integrated into Quark APP, Alibaba Cloud Disk and other ports provide services.

E-commerce: The user side optimizes the shopping experience, and the merchant side improves operating efficiency. Since there are a large number of human-computer interaction, content generation and other scenarios in e-commerce links, it is suitable to be combined with AI to achieve early application implementation. After Alibaba's e-commerce business is combined with AI capabilities, the user side will optimize the consumer shopping experience and reduce decision-making costs through intelligent recommendations, assisted decision-making, etc.; the merchant side will not only reduce marketing costs through AI-assisted creative generation, but also reduce marketing costs through intelligent Customer service and other methods reduce operating costs. At the same time, AI brings more accurate user insights and is expected to improve merchants’ operating output, thereby opening up potential monetization space in the future; on the platform side, consumers and merchants are expected to have better user experience and operating results. Improve the awareness and stickiness of users and merchants towards the platform, thereby further protecting the platform’s market share.

Smart Terminal: Tongyi Qianwen empowers Tmall Genie and is expected to become a one-stop home life service portal. The 2023 Alibaba Cloud Summit demonstrated the experience improvement brought by Tongyi Qianwen in smart home. The Tmall Elf Demo version connected to Tongyi Qianwen also demonstrated its ability to understand user needs and successfully place takeout orders, as well as its language skills and reasoning capabilities. Enhance. We believe that in the future, Tmall Genie, empowered by Tongyi Qianwen and connected to applications such as Taobao, Tmall, Ele.me, and Fliggy, is expected to optimize the interactive experience and become a one-stop family life service portal.

Tongyi Qianwen actively cooperates with enterprises to create enterprise-specific large models to meet individual needs. In addition to Alibaba's internal applications that will be fully connected to large models, Tongyi Qianwen will also cooperate with various industries to generate industry-specific and enterprise-specific large models to meet the personalized needs of enterprises and improve business efficiency. At present, Alibaba Cloud has carried out technical cooperation exploration and co-creation with a number of enterprises in large-scale model-related scenarios. The first batch of cooperative enterprises include OPPO Andis Intelligent Cloud, China Pacific Insurance, Geely Automobile, Chery New Energy, Bosideng, etc. According to the 2023 Alibaba Cloud Summit, two weeks after Alibaba released Tongyi Qianwen, more than 200,000 companies applied for access.

Baidu: Widely used internally, it is expected to reshape the search experience of its main business. At present, the Wenxin large model has been widely used in Baidu's various internal products such as search, information flow, Xiaodu smart screens, Baidu maps, etc., significantly improving the intelligent product experience. Especially for Baidu's main search business, there are pain points such as redundant and complex information and high user screening costs under the traditional search model. Combined with AI capabilities, Baidu Search generates answers in a conversational manner and lists data sources, which is expected to significantly optimize user searches. experience, thereby further increasing the user scale and frequency of use.

Industry application: Deep into the real economy, the industry ecology is constantly enriched. Based on the general Wenxin large model, combined with industry data and knowledge maps, Wenxin has released a total of 11 industry large models in the fields of electric power, gas, finance, aerospace, media, city, film and television, manufacturing, social science and other fields. Continuously empower the digitalization and intelligence processes of thousands of industries through large models. On the first day after Wenxinyiyan was released, it completed the first batch of signings with 5 companies, started signing contracts with 650 companies, and more than 65,000 companies applied for deployment, leading the industrialization process.

Wenxin Yige: AI-assisted art and creative generation is expected to release AIGC productivity. Wenxin Yige is an AI art and creative assistance platform launched by Baidu based on Wenxin large model technology. It can automatically generate paintings based on text descriptions input by users, and can also edit and create secondary works according to user needs. As of the end of May 2023, the number of registered users of Wenxin Yige's official website has exceeded 6 million, and more than 900 ecological partners have participated in the Wenxin Yige test. With the continuous optimization and iteration of Wenxin Yige's model capabilities, it is expected to greatly improve the production efficiency in the fields of game original paintings, advertising and marketing materials, industrial design, and architectural design. While achieving breakthroughs in content creation capabilities, it can also promote the company's downgrading. This increases efficiency.

Tencent: Multi-core business implementation, industry large model promotes ecological co-construction

Games: Reduce production costs and enrich player experience. AI technology can be fully used in the entire game chain. On the one hand, AI can assist game production, application and surrounding ecological development, lowering the threshold and cost of game creation, while improving game quality; on the other hand, AI can also expand to more diversified Game categories, such as chess and cards such as Go and Mahjong, sports such as football, and complex strategy games such as Multiplayer Online Battle Arena (MOBA) and First Person Shooter (FPS), continue to enrich player experience.

Advertising: Improve understanding and computing capabilities, taking into account volume, cost and stability. Hunyuan AI large model helps Tencent's advertising system to upgrade, creating solutions in four aspects: advertising content understanding, advertising intelligent creation, advertising intelligent review and advertising fingerprint system, which greatly improves the system's advertising content understanding, industry feature mining, and copywriting. Capabilities in creative generation and other aspects can help advertisers achieve the three major performance indicators of volume, cost and stability, and achieve business growth.

ByteDance: Released the large-model dialogue product “Doubao” in August to accelerate research and development progress. In August, Byte released the large-model dialogue product Doubao, Xiaoning who loves to chat, English learning assistant, English writing polish, and all-round writing assistant, which has basic capabilities such as question and answer dialogue, intelligent creation, etc. According to the SuperCLUE evaluation, Doubao currently has large differences in capabilities in different dimensions. It has strong performance in logical reasoning and calculation, but there are still certain shortcomings in coding, contextual dialogue, etc. However, considering that Byte only established a large model team at the beginning of the year, we believe that the current overall progress has exceeded expectations, and it is expected to further accelerate the iterative progress with continued research and development.

Business model: MaaS creates a new commercial model, and cloud vendors have high certainty of growth.
Rock from other mountains: commercialization inspiration from North American cloud giants

API calls: Provides access to closed source models. Among the three major cloud vendors, Microsoft and Google choose to keep their models closed source and obtain revenue in the form of selling model APIs. At present, the commercialization of Microsoft and OpenAI models has begun. According to Reuters data, OpenAI's revenue in 2022 is expected to be only about US$80 million; according to data from the research organization PitchBook, OpenAI is expected to generate revenue of US$200 million this year, and is expected to generate revenue of US$200 million by 2024. By 2020, OpenAI's revenue may reach $1 billion. Although Google also sells the PaLM series of models in the form of API, at present, the product is still in the preview stage and has not been fully commercialized. However, Google Cloud also provides API capabilities for scenarios such as speech-to-text conversion. At the same time, according to the disclosure at the I/O conference, the company expects to provide a usable version of the model in the next few months. For Amazon, it launched the model calling platform Amazon Bedrock to provide third-party model calling.

Deployment and fine-tuning: Commercialization based on model training time or data volume. The complete life cycle of a model includes training the model, deploying the model to the endpoint, using the model for prediction (inference), etc. Cloud vendor products include Google Vetrex AI, OpenAI, etc. From the perspective of business model, Google provides a time-based pricing model that provides training, deployment, inference and other functions. The cost of training is significantly higher than that of inference. OpenAI chooses to price based on data volume, but provides four price levels for different levels of training. We believe that the current pricing is still due to higher computing costs. In the future, as computing costs and infrastructure improve, pricing will be more specific.

Segmented scenarios: Microsoft Copilot, Google Duet AI, etc. Copilot is Microsoft's generative AI assistant, which has been used in code development scenarios such as Github. According to Microsoft in its technical documentation, Copilot adopts the GPT-4 model launched by OpenAI, and further optimizes the reliability of the output content and the privacy issues of the data used, making it ready for enterprise-level applications. AI capabilities are directly integrated into applications such as Word, Excel, PowerPoint, Outlook, and Teams. Users can ask questions and prompt AI to write drafts, create presentations, edit emails, create presentations, summarize meetings, and more. Copilot will work with Microsoft 365 customers in two ways: 1) embedded into Word, Excel, PowerPoint, Outlook, Teams, etc.; 2) providing chat capabilities. The business chat function covers calendars, emails, chats, documents, meetings and contact work of LLM, Microsoft 365 applications and customers. Through natural language prompts (such as "Tell my team how we update product strategy"), business chat will Generate status updates based on morning meetings, emails, and chat threads. Google also relied on its original Worksapce business to introduce generative AI into Workspace and launched Duet AI for Google Workspace, allowing users to collaborate with AI to compose emails, generate PPT with graphics and text, and summarize tables.

Business model: MaaS creates a new commercial model, and cloud vendors have high certainty of growth

IaaS: In the short term, demand for computing products from cloud vendors is expected to grow rapidly. In the short term, we believe that the rapid development of AI will definitely increase the demand for computing power, and the computing products of cloud vendors are expected to be the first to benefit. Alibaba Cloud, Baidu Cloud, and Tencent Cloud have all launched GPU cloud servers equipped with NVIDIA A100. It can significantly improve AI training performance and high-performance computing speed, and support the vigorous development of the domestic AI industry through high-performance infrastructure.

PaaS: Create standardized cloud products and output general AI capabilities. For general-purpose AI capabilities such as machine learning platforms, visual intelligence, natural language processing, and intelligent voice, cloud vendors have built them into standardized products for sale, and the charging models are mainly pay-as-you-go or resource package models. For example, the model training DLC ​​product provided by the machine learning platform PAI provides sufficient computing power through a dedicated resource group. You can pay either through a resource package (59 yuan/100 CU*H) or on a pay-as-you-go basis (starting at 205.7 yuan/month). , thus providing flexible calling methods.

MaaS: In the medium to long term, MaaS is expected to become the most important business model in the model layer. Since large-scale AI models usually require powerful computing power and resources, many companies and individuals cannot afford deployment and operation and maintenance costs. MaaS encapsulates complex technical issues through the cloud service platform, so that users do not need to pay attention to the underlying implementation and can easily access and use AI models. MaaS is expected to become the most important business model in the model layer, including subscription and API on-demand charging models. Under the subscription model, users pay periodic fees based on usage needs and enjoy model services for a certain period of time. Under the API on-demand charging model, users pay based on the actual number of API calls or data volume, so that users can flexibly adjust expenditures based on business volume.

The massive parameters and databases in the AI ​​large model training process, as well as the rapid growth in the number of user calls, have brought about a surge in computing power requirements. Taking into account resource advantages and economy, cloud vendors are expected to become the main carrier of AI computing power needs. According to IDC, my country's AI public cloud market will reach 7.97 billion yuan in 2022, a year-on-year increase of 80.6%. Among them, Baidu Smart Cloud, Alibaba Cloud, Huawei Cloud, and Tencent Cloud lead the market share, with industry CR4 reaching 93.7%. In 2022, Baidu/Alibaba/Tencent AI public cloud revenue will be 23.0/21.8/1.49 billion yuan respectively, a year-on-year increase of 69.7%/71.2%/124.6%, corresponding to an overall revenue contribution of 13.0%/2.8%/4.7%. Domestic Internet cloud manufacturers have accumulated profound advantages in infrastructure, technical architecture, self-developed chips, model algorithms, etc., and are expected to benefit from the increasing demand for computing power in training and application brought about by the booming domestic AI industry.

(This article is for reference only and does not represent any investment advice on our part. If you need to use relevant information, please refer to the original report.)

Guess you like

Origin blog.csdn.net/WitsMakeMen/article/details/132838404