The involution of the AI large model is intensified, why does SenseTime get involved

 

In 2023, there are so many large domestic models.

At present, there are as many as 20 domestic companies that have announced or are about to launch large-scale models, and basically all relevant companies that can be thought of have entered the game. Among them, there are not only well-funded companies such as BAT, Huawei, and Byte, but also start-ups led by Internet giants such as Wang Huiwen, Wang Xiaochuan, and Zhou Bowen, as well as AI companies in vertical fields, such as Shangtang Technology, iFLYTEK wait.

On April 10, SenseTime launched the "SenseNova" large-scale model system, including a series of generative AI models: large language model "Consultation", Wenshengtu AI platform "Miaohua", AI digital human video generation platform "Ru "Movie", 3D content production platforms "Qiongyu", "Gewu", and large model data annotation platform "Mingmu".

In the public opinion field of large models, SenseTime is not high-profile, but directly moved out a full set of large-scale model products, and quickly opened trial channels for enterprises. Judging from the effect of the real-time demonstration at the press conference, the strength of the SenseTime model cannot be underestimated. "A company that focuses on CV can produce such a highly mature product on LLM, and its development team is admirable." The opinion of a Zhihu netizen represents some external voices.

At the same time, some voices of doubt also appeared, such as the opinion that it is necessary for unicorn AI companies that focus on the machine vision track (CV) to join in the fun and make large-scale models. After all, large models need to burn huge amounts of money for a long time, and the current commercialization model is limited, which will put greater pressure on loss-making AI companies that are not yet profitable.

So, what do you think of SenseTime's entry into the field of multi-modal large-scale models? In the increasingly crowded "Chinese version of ChatGPT" competition, what kind of unique ecological niche will SenseTime emerge from?

 

From large installations to large models, always pointing to the same goal

When ChatGPT pushed the AI ​​industry through the "iPhone moment", the versatility of the large model has been successfully verified.

More importantly, a paper released by Microsoft in March this year pointed out that "GPT-4 can already be regarded as an early version of general artificial intelligence." This view has also been recognized by quite a few professionals. Some top scientists represented by Geoffrey Hinton, the father of deep learning, believe that general artificial intelligence (AGI) is no longer out of reach, but may be gradually realized within decades.

Next, in the deterministic direction of the large model, how Chinese and foreign AI companies can find out the path that suits them has become a key proposition.

SenseTime's answer to this question is: firmly follow the path of "large installation + large model".

Friends who understand SenseTime's strategy will know that SenseTime's launch of the large-scale model at this time is not to follow the trend, but to continue to solve another key node on the road to the large-scale implementation of AI.

Previously, the core reason for the difficulty of implementing AI was the development model of one model for each scenario, resulting in high implementation costs, low model reuse rate, and difficulty in scale and standardization. At that time, GPT3 with 100 billion parameters, which came out in 2020, had brought a certain breakthrough in the versatility of large models. SenseTime's solution is the same, taking root in the underlying infrastructure, hoping to achieve versatility with a huge amount of parameters × amount of data.

In 2021, SenseTime will launch SenseCore, a large AI device, and will complete an important expansion in 2022. AI large installations can be understood as large computing power infrastructure + large model as a service (Model as a Service).

At present, the AI ​​​​large device has built a parallel computing system composed of 27,000 GPUs, achieving a computing power output of 5.0 exaFLOPS, and is currently one of the largest intelligent computing platforms in Asia. The computing power on the SenseTime AI large device can simultaneously support 20 super-large models with a scale of 100 billion parameters, and train them at the scale of kilocalories at the same time.

At the same time, the large AI device also provides large models as a service, including automatic data labeling, large model parallel training, inference deployment, etc. At present, the largest cluster of large AI devices consists of 4,000 chips connected in parallel, which can train dense models with more than 500 billion parameters. This year's goal is to be able to train large models with more than one trillion parameters.

 

With such a large investment, how is the application of large AI devices?

In 2022, SenseTime will open the capabilities of large AI devices to industry customers, and help customers efficiently train large models by providing high-performance computing resources, rich pre-training model libraries, easy-to-use development tools and professional technical support. At present, more than 10 large-scale model development tasks have been realized, covering the development of user-defined large-scale models in the fields of vision, language, and multimodality.

After being opened to enterprises, large-scale AI devices have already achieved large-scale revenue. According to SenseTime’s 2022 annual report, in 2022, the revenue generated from external services of large AI devices will account for more than 20% of the overall revenue of smart commerce (one of SenseTime’s four major business segments). Based on the 2022 smart business revenue of 1.464 billion yuan, the large AI device has brought nearly 300 million yuan in revenue to SenseTime.

Seeing this, you may understand a little bit, SenseTime does not build a large model from scratch. The reason why various types of large-scale models can be quickly launched in a short period of time is inseparable from the foundation of AI large-scale installations. And in fact, SenseTime's layout in the field of large models is earlier than that of large AI devices.

 

In the field of CV, which is the best, SenseTime released a large visual model with 1 billion parameters in 2019. In 2022, SenseTime's large visual model has evolved to 32 billion parameters, and it is also the largest visual model in the world so far.

Since 2021, SenseTime has started to develop large NLP models and multimodal models by itself. In the field of NLP, SenseTime's large language model has reached the level of hundreds of billions of parameters.

In the field of multi-modality, in March 2022, SenseTime, together with Shanghai Artificial Intelligence Laboratory, Tsinghua University, Chinese University of Hong Kong, and Shanghai Jiao Tong University, released a multi-modal and multi-task general-purpose large model "Scholar (INTERN)", with 30 billion parameters. The Shusheng large model has been open sourced on the general vision open source platform OpenGVLab, and it is currently the most powerful multimodal large model in the open source model community.

In the field of AIGC, SenseTime launched an AIGC model with 1 billion parameters, which can support various functions of Vincent graphs and graph-generated graphs. The SenseTime decision-making intelligence model of AlphaStar, which surpassed DeepMind in the StarCraft competition, will also be integrated into the multi-modal large model in the future.

" In the future, SenseTime's general artificial intelligence large-scale model system will cover four major aspects: visual perception, language understanding, content generation, and reasoning and decision-making. " Wang Xiaogang, co-founder and chief scientist of SenseTime, said.

At present, SenseTime's large-scale models have been delivered in more than 20 scenarios in the four major sectors of smart city, smart business, smart car and smart life. For example, in the field of autonomous driving, the BEVFormer++ perception algorithm developed by SenseTime in the field of large visual models won the main track championship of the 2022 Waymo Challenge with an absolute advantage.

To sum up, the goal of SenseTime's large-scale model is not to snatch the gimmick of the "Chinese version of ChatGPT" for individual users, but to use "large device + large model" to accelerate the commercialization of AI.

The dual driving force of "new every day"

"AGI has spawned a new research paradigm, that is, based on a powerful multi-modal base model, the new capabilities of the base model are continuously unlocked through reinforcement learning and human feedback, so as to solve massive open tasks more efficiently. AGI will realize from' The evolution from the data flywheel to the "wisdom flywheel" will eventually lead to the mutual intelligence of man and machine." said Wang Xiaogang, co-founder and chief scientist of SenseTime.

Specific to SenseTime’s “Daily New” model, the “Intelligent Evolution Theory” believes that there are at least two driving forces for SenseTime:

The first drive is to empower externally through rich AIGC large models, including through the form of open APIs, to lower the threshold for applying large models in various industries;

At present, the large models of SenseTime's "Daily New" series are only open for trial use by enterprise users. However, judging from the real-time demonstration of the press conference, the initial impression of the "Daily New" large model is that the overall capabilities are comprehensive, which is equivalent to no "bad course", and it is impressive in terms of digital human video generation and 3D content generation. Amazing, beyond expectations.

 

SenseTime's ChatGPT large language model "Consultation" has smooth performance in multiple rounds of dialogue, and has advantages in two sub-fields: consultation and programming. Programming assistants can help developers write and debug code more efficiently; in terms of health consultation, "Consultation" is equivalent to an AI version of a general hospital triage desk, similar to what symptoms should be seen in which department, and can provide users with personalized advice. medical advice. In addition, "Consult" can directly read PDF files to extract key information, which is also very practical.

"Miaohua SenseMirage" Vincent image creation platform can support the generation of 6K high-definition images, and also supports user-defined training and generation models.

The "SenseAvatar" AI digital human video generation platform only needs a 5-minute live video material to generate digital human avatars with natural voices and movements, accurate mouth shapes, and proficient in multiple languages, which will greatly reduce e-commerce. The labor cost of high-frequency application scenarios such as live broadcast and online education.

 

"Qiongyu SenseSpace" and "Gewu SenseThings" 3D content generation platforms can generate large-scale 3D scenes and refined objects with high efficiency and low cost, and can provide high-quality and low-cost construction technologies for virtual-real scenes such as Metaverse.

The second drive is to strengthen SenseTime's existing advantages in the fields of CV and visual perception, and accelerate the implementation of AI technology.

In the field of intelligent driving, based on the large visual model, SenseTime developed the BEV (Bird Eye View) perception algorithm for autonomous driving, and won the championship with an absolute advantage in the Waymo Challenge. Based on this algorithm, SenseTime has developed UniAD, the industry's first end-to-end autonomous driving solution integrating perception and decision-making, which can bring stronger decoding capabilities of environment, behavior and motivation to multi-modal models of autonomous driving.

It is reported that "SenseNova" provides a variety of flexible API interfaces and services for government and enterprise customers, including image generation, natural language generation, general visual perception tasks and labeling services. By calling the API interface, enterprise users can fine-tune according to the base model, and realize various AI applications with low threshold, low cost and high efficiency.

 

From a single point to a platform, when transformation accelerates

It is worth noting that when firmly committed to the "large device + large model" route, SenseTime itself is also in a critical period of business structure transformation.

At present, the business boundary of SenseTime is not limited to the CV field, but is becoming a general AI basic platform company. And in the process, "large device + large model" did not weaken the original advantages of the CV field, but enhanced it.

From SenseTime's positioning of AI large devices as "the leader of infrastructure in the AGI era", it can be seen that SenseTime, the leader of the CV Four Tigers in the past, is no longer what it used to be. Regarding the breakthrough of the industry boundary, Xu Li, chairman and CEO of SenseTime, once told the media, "When we realize the integration of the physical world and the digital world, AI will become an infrastructure that can be used by everyone. There is no need to distinguish between industries.”

However, the change in business structure is a stronger proof of the transformation. Through the 2022 annual report, SenseTime's four core businesses have shown obvious changes of "two ups and two downs". Among them, the revenue of the two major business segments of smart city and smart commerce declined, while the two emerging businesses of smart life and smart cars increased significantly, showing a trend of more diversified and healthy business structure.

For example, in 2022, the revenue of smart life business will increase by 129.9% year-on-year, hitting a record high, and the proportion of total revenue will increase from 8.8% in 2021 to 25.1%. The smart life business covers multiple product lines such as AI content generation (AIGC), AI sensors, AI ISP chips, and smart medical care, all of which have achieved commercial breakthroughs. In 2022, the revenue of the smart car business will increase by 58.9% year-on-year, and the proportion of total revenue will increase from 3.9% in 2021 to 7.7%.

"Ri Ri Xin comes from "Book of Rites·University", and Tang Zhipan's inscription says 'Gou Ri Xin, Ri Ri Xin, and Ri Ri Xin'. That is to say, it must be updated every day, and the new ones must be updated. This represents the large model of artificial intelligence. In terms of weekly data input, it can be updated every day, and the ability can be enhanced every day.” Xu Li explained the origin of the daily update in this way at the press conference.

2023 is the first year of the explosion of domestic AI large-scale models. Standing at the current time node, it may be difficult for us to predict whether the future market structure in the field of AI large-scale models will be an oligopoly or a hundred flowers blooming.

Because this is a long-term, all-round competition that tests the core strengths of each entrant, such as capital reserves, strategic will, and technical capabilities.

Perhaps, it is not necessary and impossible for every player to be large and comprehensive, and the way to break through is to concentrate resources and focus on large models with the most differentiated advantages.

The pictures in the text are from Photography Network

END

This article is the original work of "Intelligent Evolution".

Guess you like

Origin blog.csdn.net/AImatters/article/details/130254370