GPU, AI chip technology market analysis

Market analysis of GPU and AI chip technology
The market will grow rapidly, and the dawn of GPU is emerging. It is estimated that by 2024, the domestic artificial intelligence technology market will reach 17.2 billion US dollars; The main driving force of market growth, in the two fields of AI computing training and reasoning, has already released many start-up companies GPU products.
Heterogeneous computing is the result of a comprehensive balance of flexibility and efficiency. Training refers to AI technology that simulates the ability of humans to receive, learn, and understand external information; reasoning refers to AI technology that simulates the logic of human beings to obtain information through psychological activities such as learning, judgment, and analysis. .
From the perspective of deployment flexibility, CPU is the most flexible, followed by GPU, and FPGA and ASIC are the last two. In terms of computing efficiency, ASIC is the most efficient, followed by FPGA, and GPU and CPU are the last two.
Heterogeneous computing is a balanced result: Considering deployment flexibility and computing efficiency, heterogeneous computing is a balanced result, and CPU + GPU, FPGA, ASIC are the trends.
Training mainly uses CPU, GPU, some FPGA and ASIC; inference mainly uses CPU, FPGA, ASIC, and some GPU.
This article refers to the improvement of Power Junzhixiang's new power article
https://mp.weixin.qq.com/s/O5JZ8YdFwNL3NsCG4whB2A
Future AI computing will form a CPU as the control center, GPU, FPGA, ASIC (NPU, VPU...) Heterogeneous computing landscape of accelerator cards for specific scenarios.
Heterogeneous computing refers to the computing method of a system composed of computing units with different types of instruction sets and architectures. Currently, "CPU+GPU" and "CPU+FPGA" are both concerned heterogeneous computing platforms.
The biggest advantage of heterogeneous computing is that it has more efficient and low-latency computing performance than traditional CPU parallel computing, especially when the industry's demand for computing performance is rising, heterogeneous computing has become more and more important.
insert image description here

insert image description here

01. In the era of computing power, GPU opens up new scenarios
. Broadly speaking, as long as the chips that can run artificial intelligence algorithms are called AI chips. However, AI chips in the usual sense refer to chips that are specially designed to accelerate artificial intelligence algorithms.
AI chips are also known as AI accelerators or computing cards, which are modules dedicated to processing a large number of computing tasks in artificial intelligence applications (other non-computing tasks are still handled by the CPU). Up to now, the development of AI chip computing power has gone through three stages:
the first stage: because the chip computing power is insufficient, so the neural network has not been taken seriously;
the second stage: the computing power of general-purpose chip CPUs has been greatly improved, but it is still not enough The demand for neural networks;
the third stage: GPUs and AI chips with new architectures promote the implementation of artificial intelligence.
insert image description here

▲The development stage of AI chip computing power
At present, the GPT-3 model has been selected as one of the "Top Ten Breakthrough Technologies" by MIT Technology Review in 2021. The largest data set used by the GPT-3 model has a capacity of 45TB before processing. According to OpenAI's computing power statistical unit petaflops/s-days, training AlphaGoZero requires 1800-2000pfs-day, while GPT-3 uses 3640pfs-day.
insert image description here
▲Natural language model/conversational AI platform
AI computing refers to "deep learning" as The representative neural network algorithm requires the system to efficiently process a large amount of unstructured data (text, video, image, voice, etc.). It requires the hardware to have efficient linear algebra computing capabilities, and the computing tasks include: simple unit computing tasks and difficult logic control requirements Low, but has the characteristics of large amount of parallel computing and many parameters. It puts forward higher requirements for the multi-core parallel computing of chips, on-chip storage, bandwidth, and low-latency memory access.
Since 2012, the needs of artificial intelligence training tasks The computing power of AI chips doubles every 3.43 months, which greatly exceeds the long-standing Moore's Law in the chip industry (the performance of chips doubles every 18 months). For different application scenarios, AI chips should also meet the following requirements: For mainstream AI algorithms Framework compatibility, programmability, scalability, low power consumption, volume and price, etc.
From the perspective of technical architecture, AI chips are mainly divided into graphics processing units (GPU), field programmable gate arrays (FPGA), application-specific integrated circuits ( ASIC) and brain-like chips. GPU is a relatively mature general-purpose artificial intelligence chip. FPGA and ASIC are semi-custom and full-custom chips for artificial intelligence needs. Brain-like chips subvert the traditional Von Neumann architecture. A kind of chip that simulates the neuron structure of the human brain, and the development of brain-like chips is still in its infancy.
insert image description here
▲Comparison of three types of technical architecture AI chips
The global artificial intelligence chip market size in 2019 is 11 billion US dollars. As artificial intelligence technology becomes more and more mature, The continuous improvement of digital infrastructure and the commercial application of artificial intelligence will promote the rapid growth of the AI ​​chip market. It is estimated that the global artificial intelligence chip market will reach 72.6 billion US dollars in 2025.
insert image description here
▲2019-2025 global artificial intelligence chip market size and forecast (billion US dollars)
02. Three major application scenarios AI is the king
GPU is actually a set of graphics functions implemented by hardware, these functions are mainly used to draw various graphics required operation. These operations related to pixels, light and shadow processing, 3D coordinate transformation, etc. are implemented by GPU hardware acceleration. Graphics operations are characterized by intensive operations on a large number of data of the same type—such as matrix operations on graphics data. The micro-architecture of the GPU is designed for numerical calculations suitable for matrix types, and a large number of repetitively designed computing units. Such calculations can be divided into Numerous independent numerical computations—a large number of threads of numerical operations, and the data is not logically related like program execution.
The design and development of GPU micro-architecture is very important, and advanced and excellent micro-architecture is crucial to improving the actual performance of GPU. At present, there are very rich GPU micro-architectures on the market, such as Pascal, Volta, Turing (Turing), Ampere (Ampere), which were released in 2016, 2017, 2018 and 2020 respectively, representing the highest technology level of NVIDIA GPU.
The API (Application Programming Interface) of the GPU acts as a bridge between the application and the graphics card driver. Currently GPU API can be divided into 2 camps and several other categories. The two major camps are Microsoft's DirectX standard and KhronosGroup standard. Other categories include Apple's Metal API, AMD's Mantle (mantle) API, and Intel's One API.
AI chips (GPU/FPGA/ASIC) undertake both artificial intelligence "training" and "reasoning" processes in the cloud, and are mainly responsible for the "reasoning" process in the terminal. ASIC is optimal in terms of performance and cost. As a special-purpose chip, ASIC has absolute advantages in computing power and power consumption over general-purpose chip GPUs, but the development cycle is long and the implementation is slow. It needs a certain scale to reflect the cost advantage. FPGA can be seen as a key transition plan from GPU to ASIC. Compared with GPU, it can be deeply optimized at the hardware level. Compared with ASIC, it is more flexible in the case of constant iterative evolution of the algorithm, and the development time is shorter.
From the perspective of ecology and implementation, GPU has an absolute advantage, and NVIDIA has a monopoly position. Developers can easily develop NVIDIA GPUs to achieve computing acceleration through the NVIDIA CUDA platform using software languages. It has been widely recognized and popularized, and a good programming environment has been accumulated. ASIC represented by TPU is currently mainly used in the closed-loop ecology of giants, and FPGA is developing rapidly in the data center business.
The GPU market size in 2020 was $25.41 billion and is expected to reach $185.31 billion by 2027, growing at a CAGR of 32.82% from 2021 to 2027. The GPU market is segmented into standalone, integrated, and hybrid markets. Integrated dominated the GPU market share in 2019, but the hybrid segment is expected to witness the highest CAGR going forward as hybrid processors have both integrated and discrete GPU capabilities.
The market is segmented into computers, tablets, smartphones, game consoles, TVs, and others. In 2019, the smartphone market dominated the global GPU market share and this trend is expected to continue over the forecast period. Other segments are expected to witness the highest CAGR in the future owing to the growing demand for small GPUs in other devices such as medical devices. The automotive application segment is expected to grow at the highest CAGR during the forecast period owing to the widespread use of graphics processors in design and engineering applications.
In general, GPUs have three major application scenarios: games, AI and autonomous driving
1. Games
According to IDC data, shipments of gaming PCs and monitors in 2020 increased by 26.8% year-on-year to 55 million units. Gaming laptops grew by a record 26.9% in 2020. Parallel to PCs, gaming monitors also reached new heights in 2020, growing by more than 77% compared to 2019, with shipments reaching 14.3 million units.
IDC expects sales of gaming monitors to surpass gaming desktops for the first time in 2021. Even as gaming desktops gain traction, increasing monitor connectivity for gaming laptops means that the five-year CAGR of the gaming monitor market is expected to exceed 10%. IDC expects global sales to reach 72.9 million in 2025, a CAGR of 5.8%.
2. AI
mobile AI chip market is not limited to smartphones, potential markets also include: smart bracelets/watches, VR/AR glasses and other markets.
In edge computing scenarios, AI chips are mainly responsible for inference tasks, and the inference results are obtained by substituting data collected by sensors (microphone arrays, cameras, etc.) on the terminal device into the trained model for inference. Because edge-side scenarios are diverse and different, the considerations for computing hardware are also different, and performance requirements such as computing power and energy consumption are also large and small. Computing chips applied to the edge side need to be specifically designed for special scenarios to achieve optimal solutions.
insert image description here
▲The performance requirements of AI chips in different edge computing scenarios
The development of security cameras has experienced the development from analog to digital, digital high-definition to digital intelligence. The latest smart cameras can realize structured images in addition to simple recording and storage functions. data analysis. Security cameras can generate 20GB of data a day. If all data is sent back to the cloud data center, it will greatly occupy the network bandwidth and data center resources.
By adding AI chips to the camera terminal and network edge side, localized real-time processing of camera data is realized. After structured processing and key information extraction, only data with key information is sent back to the rear, which will greatly reduce network transmission. Bandwidth pressure. The current mainstream solutions are divided into: integrating AI chips in front-end camera devices and adopting intelligent server-level products on the edge side. Front-end chips need to balance area, power consumption, cost, reliability and other issues in design, and it is best to adopt low-power, low-cost solutions (such as DSP, ASIC); there are fewer edge side restrictions, and can be used for larger Server-level products (eg, GPU, ASIC) for large-scale data processing tasks.
insert image description here
▲Application of AI chips in smart security cameras
Artificial intelligence servers are usually equipped with GPU, FPGA, ASIC and other acceleration chips. The combination of CPU and acceleration chips can meet the needs of high-throughput interconnection for natural language processing, computer vision, and voice interaction. Other artificial intelligence application scenarios provide powerful computing power support, which has become an important supporting force for the development of artificial intelligence. Compared with traditional CPU servers, under the condition of providing the same computing power, GPU servers are the traditional solutions in terms of cost, space occupation and energy consumption. 1/8, 1/15 and 1/8.
Currently, the most widely used AI chips in cloud scenarios are NVIDIA's GPUs, mainly due to: powerful parallel computing capabilities (compared to CPUs), versatility, and mature development environments. In 2020, the global AI server market size is 12.2 billion US dollars. It is expected that the global AI intelligent server market will reach 28.8 billion US dollars by 2025, and the 5-year CAGR will reach 18.8%.
insert image description here
▲2020-2025 Global AI server industry market size and growth rate (unit: US$100 million)
In AI development, since the development and deployment of deep learning models requires strong computing power, dedicated chips and servers are required. If developers choose to purchase AI servers by themselves, the cost is too high. Through the cloud service model, renting the computing resources of the supercomputing center on demand can greatly reduce the capital investment at the beginning of the project and save the hardware operation and maintenance costs during the project development period, so as to maximize the efficiency of capital allocation.
3. Autonomous driving
Global autonomous driving has entered the commercial stage, and the future can be expected. According to IDC's latest "Global Autonomous Vehicle Forecast Report (2020-2024)", the global L1-L5 autonomous vehicle shipments are expected to reach about 54.25 million units in 2024, with a compound annual growth rate from 2020 to 2024. The market share of L1 and L2 autonomous driving in 2024 is expected to be 64.4% and 34.0%, respectively. Although the current application of L3-L5 autonomous driving technology has pioneering significance, L1-L2 autonomous driving will still be the largest market segment that will drive the growth of global autonomous vehicle shipments in the next five years.
The scale of China's auto market continues to grow, and autonomous driving is transitioning from L2 to L3. According to data from the China Automobile Association, from January to March 2021, Chinese brand passenger vehicles sold a total of 2.108 million units, an increase of 81.5% year-on-year, accounting for 41.5% of total passenger vehicle sales, and the share increased by 1.4 percentage points over the same period last year. . From January to September 2020, the sales volume of L2 intelligent connected passenger vehicles reached 1.96 million, accounting for 14.7% of the total passenger vehicle sales.
Some companies have accelerated the research and development of L3-level autonomous vehicles, and carried out demonstration applications of autonomous parking, autonomous buses, and unmanned intelligent heavy trucks in many places. By 2025, China's PA (partially autonomous driving) and CA (conditional autonomous driving) level ICV sales will account for more than 50% of the total car sales in that year, and C-V2X (cellular communication-based mobile Internet of Vehicles) terminals The new car assembly rate reaches 50%.
With the further improvement of products such as sensors and in-vehicle processors, more L3-class models will appear. L4 and L5 autonomous driving is expected to be the first to be implemented on commercial vehicle platforms in closed parks. High-level autonomous driving on a wider range of passenger vehicle platforms needs to be accompanied by further improvements in technology, policies, and infrastructure construction. It is not expected to appear on general roads until at least 2025-2030.
insert image description here
▲Prediction of the penetration rate of autonomous driving in the global auto market from 2016 to 2030
Perceive the road environment and process massive data in a short time. After relying on sensors such as radar to collect rational information during driving, the processor needs to analyze several gigabytes of data in real-time per second, and can generate more than 1G of data per second. The computational requirements for the processor are high.
Automatic planning, instantaneous response to ensure safety. After processing and analyzing the real-time data, it is necessary to plan the driving path and vehicle speed with the time precision of milliseconds to ensure the safety of the driving process, and the computing speed of the processor is required to be high.
With both technical cost advantages, GPU is the mainstream in the field of autonomous driving.
03. Domestic AI GPUs are on the fast track
. In 2020, the investment and financing amount of the domestic AI chip industry will increase by 52.8% year-on-year. The investment and financing events and amounts from January to April 2021 have exceeded that of last year. Investment in the circuit field is booming.
From the perspective of popular fields, the field of artificial intelligence is one of the sub-segments that are highly favored by capital in 2020. Capital investment in 2020 is mainly for AI chip companies that are relatively mature and have received 1-2 or even more than 2 rounds of financing.
insert image description here
▲The establishment time, financing history and valuation of companies in the
AI ​​chip industry The market expectations of the AI ​​chip industry are gradually becoming more rational, and entrepreneurship has entered the market testing period. A large number of AI chip companies were established in 2015-17. In the next 1-2 years, the market will actually test the products and technologies of various manufacturers. The market expects AI chips with higher computing power, lower power consumption and lower cost.
insert image description here
▲Introduction of chips from different companies
1. Muxi Integrated Circuit: Multi-scenario high-performance GPU
Muxi Integrated Circuit focuses on designing high-performance general-purpose GPU chips with completely independent intellectual property rights for various applications such as heterogeneous computing. The company is committed to building the strongest commercial GPU chips in China. The main application directions of the products include traditional GPU and mobile applications, artificial intelligence, cloud computing, data centers and other high-performance heterogeneous computing fields. level of important basic products.
It plans to adopt the industry's most advanced 5nm process technology, focusing on the development of domestic high-performance GPU chips that are fully compatible with CUDA and ROCm ecology, to meet the computing needs of HPC, data centers and AI. Committed to R&D and production of safe and reliable high-performance GPU chips with independent intellectual property rights, serving many important fields such as data centers, cloud games, and artificial intelligence that require high computing power.
2. Biren Technology: Launched cloud AI chip
Biren Technology was founded in 2019. The company has rich technical reserves in the fields of GPU and DSA (dedicated accelerator), focusing on cloud general intelligent computing, and gradually catches up in many fields such as AI training and reasoning, graphics rendering, and high-performance general computing. Beyond the existing solutions, to achieve a breakthrough in domestic high-end general-purpose intelligent computing chips.
3. Suiyuan Technology: Pushing China's largest AI computing chip
During the 2021 World Artificial Intelligence Conference, Shanghai Suiyuan Technology launched the second-generation cloud AI training chip Suisi 2.0 and training products Yunsui T20/T21, as well as the newly upgraded Yusuan Topsrider 2.0 software platform.
Sousi 2.0 is the largest AI computing chip in China so far. It adopts the limit of ASE 2.5D package and is the first in China to support TF32 precision. The computing power of single-precision tensor TF32 can reach 160TFLOPS. At the same time, Vision 2.0 is also the first product to support the most advanced memory HBM2E. The company's main services are one-stop chip customization services and semiconductor IP authorization services for a wide range of application markets such as consumer electronics, automotive electronics, computers and peripherals, industry, data processing, and the Internet of Things.
Suiyuan Technology was established on March 19, 2018. Since its establishment, it has successively obtained 5 rounds of financing, with a cumulative financing amount of nearly 3.2 billion yuan. Its latest financing is the 1.8 billion C round of financing completed in January this year, led by CITIC Industrial Fund, funds under CICC Capital, and Primavera Capital.
4. Horizon: Intelligent driving and AI application services are
based on the innovative artificial intelligence-specific computing architecture BPU. Horizon has successfully tape-out and mass-produced China's first edge artificial intelligence chip-Journey 1 focusing on intelligent driving and AIoT-focused Rising Sun 1; In 2019, Horizon launched China's first automotive-grade AI chip Journey 2 and a new generation of AIoT intelligent application acceleration engine Rising Sun 2; in 2020, Horizon will further accelerate the iteration of AI chips and launch a new generation of high-efficiency automotive smart chips. 3 and the new generation of AIoT edge AI chip platform Rising Sun 3.
insert image description here
▲Horizon Development History
The demand for smart IoT will multiply the load on cloud computing. The intelligent Internet of Things is the future trend. Massive fragmented scenarios and the powerful edge computing capabilities of the Computing Rising Sun processor help devices efficiently process local data.
For AIoT, Horizon launched the Rising Sun series of edge AI chips. Rising Sun 2 adopts BPU Bernoulli 1.0 architecture, which can provide 4TOPS equivalent computing power, and Rising Sun 3 adopts Bernoulli 2.0, which can provide 5TOPS equivalent computing power.
Horizon has become the only full-scenario vehicle smart chip solution provider covering L2 to L4. From the mass production of China's first automotive-grade AI chip Journey 2 in 2019, to the launch of the second-generation automotive-grade chip Journey 3 in 2020. At present, Journey 2 and Journey 3 have achieved pre-installation mass production on a number of major popular models of many self-owned brand car companies such as Changan, Great Wall, Dongfeng Lantu, GAC, JAC, Ideal, Chery, and SAIC.
Horizon Matrix is ​​a vehicle-grade computing platform accelerated by the Journey 2 architecture, combined with deep learning perception technology, to provide a stable and reliable high-performance perception system for high-level autonomous driving.
insert image description here
▲Horizon Journey Series Chips
5. Black Sesame: Intelligent Driving System Solutions
Black Sesame Intelligent Technology is an enterprise focusing on visual perception technology and independent IP chip development. The company's main areas of focus are embedded image and computer vision, providing embedded visual perception chip computing platforms based on light control technology, image processing, computing images and artificial intelligence, and providing complete commercial solutions for ADAS and autonomous driving.
Based on the Huashan No. 2 A1000 chip, Black Sesame provides four intelligent driving solutions. A single A1000L chip is suitable for ADAS assisted driving; a single A1000 chip is suitable for L2+ autonomous driving; the interconnection of dual A1000 chips can reach 140TOPS computing power, supporting L3-level autonomous driving; four A1000 chips can support L4 or above autonomous driving needs . In addition, Black Sesame can also provide customized services according to different customer needs.
The cooperation between the first chip of Black Sesame Smart and SAIC has achieved mass production. The second chip, A1000, is in the process of mass production. It is expected to achieve mass production of more than 100,000 pieces in the commercial vehicle field in the second half of this year. Mass production in the field of vehicles was launched. Black Sesame Intelligence has cooperated with FAW, NIO, SAIC, BYD, Bosch, Didi, Thundersoft, Asia-Pacific Electromechanical and other companies on L2 and L3 autonomous driving perception system solutions.
Black Sesame Intelligent Technology's latest Huashan No. 2 (A1000) chip has powerful computing power of 40-70TOPS, power consumption less than 8W and superior computing power utilization. The process is 16nm, which is in line with AEC Q-100, single-chip ASIL B, System ASIL D automotive functional safety requirements, is currently the only domestic chip that can support L3 and above autonomous driving. In order to respond to different market demands, Black Sesame simultaneously released the Huashan No. 2 A1000L.
insert image description here
▲Comparison of parameters of the latest black sesame product A1000 series
In addition to the above players, Moore Thread and other companies have also made new progress recently, see the table below.
insert image description here
▲The latest progress of domestic GPUs
In the traditional GPU market, the revenue of the top three Nvidia, AMD, and Intel can almost represent the revenue of the entire GPU industry. After years of exploration and development, domestic CPUs have formed a certain climate, and the industry and ecology have gradually improved. However, the domestic GPU market is huge in scale and potential, but its development is far behind that of domestic CPUs. Driven by factors such as AI accelerated computing, independent innovation of domestic chips, and the slowdown of Moore's Law, the gap between domestic GPUs and overseas giants will gradually decrease.

Reference link
https://mp.weixin.qq.com/s/O5JZ8YdFwNL3NsCG4whB2A

Guess you like

Origin blog.csdn.net/wujianing_110117/article/details/123836579