Domestic AI server classification, technology and products (2023)

At present, the main brands of domestic servers are Inspur, Sugon, Huawei, Superfusion, H3C, Lenovo, Fenghu (research server Fenghu Information, Fenghu Yunlong), and there are many other brands, such as foreign brands HP, Dell, IBM, etc. There is still a large share in the country. In fact, the core components are the same for everyone, and it is more cost-effective to choose domestic ones.

illustrate:

1. Huawei and Super Fusion are already two companies. Super Fusion is mainly based on X86 architecture servers. Huawei is developing self-developed processor servers, mainly Kunpeng and Shengteng series;

2. Scientific research servers are often the basis of many applications, involving scientific research directions and a wide range of fields, especially with different software features and usage environments, requiring the team to have considerable professional experience. Scientific research servers are the precursor and foundation of various application scenarios.

The AI ​​server adopts GPU architecture, which is more suitable for large-scale parallel computing than CPU. General-purpose servers use CPU as the source of computing power, while AI servers are heterogeneous servers that can be combined in different ways according to the scope of application, such as CPUGPU, CPUTPU, CPU and other accelerator cards, etc., mainly using GPU to provide computing power. From the perspective of the calculation method of the ChatGPT model, the main feature is the use of parallel computing. Compared with the previous generation of deep learning model RNN, under the Transformer architecture, the AI ​​model can provide context for any character in the input sequence, so it can process all inputs at once instead of only processing one word at a time, thus enabling larger-scale parameter calculations become possible. From the calculation method of the GPU, since the GPU uses a large number of computing units and an ultra-long pipeline, its architecture design is more suitable for high-throughput AI parallel computing than the CPU.

Click to enter a picture description (up to 30 characters)

Deep learning mainly performs matrix and vector calculations, and the AI ​​server has higher processing efficiency. From the perspective of the ChatGPT model structure, based on the Transformer architecture, the ChatGPT model uses the attention mechanism to assign text word weights and output numerical results to the feedforward neural network. This process requires a large number of vector and tensor operations. AI servers often integrate multiple AI GPUs, and AI GPUs usually support multiple matrix operations, such as convolution, pooling, and activation functions, to accelerate the operation of deep learning algorithms. Therefore, in artificial intelligence scenarios, AI servers are often more efficient than GPU servers and have certain application advantages.

Click to enter a picture description (up to 30 characters)

There are two ways to classify AI servers:

1) According to application scenarios: AI servers can be divided into deep learning training type and intelligent application reasoning type according to application scenarios. Training tasks have high requirements on server computing power, and training servers are required to provide high-density computing power support. Typical products include Sugon X785-G30 and Huawei Shengteng Atlas 800 (model 9000, model 9010). The reasoning task is to use the trained model to provide services, and there is no high requirement for computing power. Typical products include Sugon X785-G40 and Huawei Shengteng Atlas 800 (model 3000, model 3010).

2) By chip type: the AI ​​server is a heterogeneous server, and the computing module structure can be adjusted according to the scope of application. It can adopt the combination of CPU+GPU, CPU+FPGA, CPU+TPU, CPU+ASIC or CPU+multiple accelerator cards. At present, the most common method in products is CPU+multiple GPUs.

Click to enter a picture description (up to 30 characters)

Common AI servers are divided into four-way, eight-way, and sixteen-way. Generally speaking, general-purpose servers mainly adopt a serial architecture dominated by CPUs and are better at logical operations; while AI servers mainly adopt a heterogeneous form dominated by accelerator cards and are better at parallel computing with high throughput. According to the number of CPUs, general-purpose servers can be divided into two-way, four-way and eight-way. Although AI servers are generally equipped with only 1-2 CPUs, the number of GPUs is significantly dominant. According to the number of GPUs, AI servers can be divided into four-way, eight-way and sixteen-way servers, among which eight-way AI servers equipped with 8 GPUs are the most common.

Click to enter a picture description (up to 30 characters)

The AI ​​server uses a multi-chip combination, and the cost of computing hardware is higher. We take a typical server product as an example to disassemble the hardware structure, and we can understand the difference between the two types of server hardware architecture more clearly: Take the Inspur general server NF5280M6 as an example, this server uses 1~2 third-generation Intel Xeon scalable processors, according to According to Intel’s official website, the price of each CPU is about 640 million yuan, so the cost of the server chip is about 64,000~128,000; Take the Inspur AI server NF5688M6 as an example, the server uses 2 third-generation Intel Xeon scalable processors + 8 Nvidia A800 The combination of GPU, according to Nvidia’s official website, each A800 is priced at 104,000 yuan, so the cost of the server chip is about 960,000 yuan.

Click to enter a picture description (up to 30 characters)

GPT model training requires large computing power support, which may bring about the need for AI server construction. We believe that as domestic manufacturers continue to deploy ChatGPT-like products, GPT large-scale model pre-training, tuning, and daily operations may bring a large demand for computing power, which in turn will drive the domestic AI server market to increase in volume. Taking the pre-training process of the GPT-3 175B model as an example, according to OpenAI, the computing power required for a GPT-3 175B model pre-training is about 3640 PFlop/s-day. We assume that the current AI server NF5688M6 (PFlop/s) with the strongest computing power of Inspur Information is used for calculation. Under the assumption that the pre-training period is 3, 5, and 10 days respectively, the number of AI servers that a single manufacturer needs to purchase is 243, 146, 73 units.

Click to enter a picture description (up to 30 characters)

The demand for AI large-scale model training is hot, and the scale growth of intelligent computing power is expected to drive the increase of AI servers. According to IDC data, in terms of half-precision (FP16) computing power, the scale of China's smart computing power in 2021 will be about 155.2 EFLOPS. With the increasing complexity of AI models, the rapid growth of computing data, and the continuous deepening of artificial intelligence application scenarios, the scale of domestic intelligent computing power is expected to achieve rapid growth in the future. IDC predicts that the scale of domestic intelligent computing power will increase by 72.7% year-on-year to 268.0 EFLOPS in 2022. It is estimated that the scale of intelligent computing power will reach 1271.4 EFLOPS in 2026, and the CAGR of computing power scale will reach 69.2% from 2022 to 2026. We believe that AI servers, as the main infrastructure for carrying intelligent computing power, are expected to benefit from the increase in downstream demand. Recommended video for EE core video: Kedian SMCV100B

Click to enter a picture description (up to 30 characters)

Domestic manufacturers have a rich product matrix and occupy a leading position in the global AI server market. Inspur Information, Lenovo, Huawei and other domestic manufacturers occupy a leading position in the global AI server market. In terms of the global market, among the top 10 manufacturers in the AI ​​server market share, domestic manufacturers occupy 4 seats, with a cumulative market share of over 35%, among which Inspur Information ranks first with a 20.2% share. From the perspective of the domestic market, the AI ​​server market is highly concentrated. The top three suppliers are Inspur Information, Ningchang and Huawei, with CR3 reaching 70.40%. We believe that domestic manufacturers have already occupied a certain leading position in the international market by virtue of their strong product competitiveness. In the future, with the release of demand for AI computing power, they are expected to fully benefit from the opportunities for industrial growth.

Click to enter a picture description (up to 30 characters)

Inspur Information: The AI ​​server product matrix is ​​rich, and the product strength has been recognized internationally. At present, the main product models of the company's AI servers include NF5688M6, NF5488A5, etc. According to the company's official website, the above two AI servers will be ranked in the MLPerf list of the international authoritative AI benchmark test in 2021, and won medical image segmentation, target object detection, natural language understanding, and intelligent recommendation. 7 training champions, which can meet multiple AI training needs including natural language understanding. In addition, the company's accumulation in the AI ​​field also includes AI resource platform, AI algorithm platform, etc., and has a lot of experience in implementing computing power solutions.

Click to enter a picture description (up to 30 characters)

Huawei: AI servers integrate self-developed accelerator cards and Intel CPUs. The company's AI server is the Atlas 800 inference server series, which includes models 3000, 3010, 9000 and 9010. Among them, the model 3000 is based on the Ascend 310 chip, the model 3010 is based on the Intel processor, the model 9000 is based on the Huawei Kunpeng 920 + Ascend 910 processor, and the model 9010 is based on the Intel processor + Huawei Ascend 910 chip. With the blessing of the flagship chip, the product has a high-density computing power of up to 2.24 PFLOPS FP16, and with the optimization of the design structure, the inter-server interconnection delay between chips can be shortened by 10-70%.

Click to enter a picture description (up to 30 characters)

H3C's AI server covers various training load requirements, and builds a complete AI ecosystem in combination with the software platform. The company's main product models include R4900 G5, R5300 G5, R5500 G5, etc., which can meet different training load requirements and meet large-scale inference/training tasks. At the software level, the company has comprehensively improved the efficiency of AI operations by about 32% through the AI/HPC integrated management platform of H3C. In 2022, H3C will be recognized by Forrester, an international authoritative analysis organization, as a mature manufacturer of large-scale artificial intelligence systems that can provide reliable server solutions. At the same time, H3C's AI server won 86 world firsts in the MLPerf evaluation.

Click to enter a picture description (up to 30 characters)

Leading manufacturers are expected to fully benefit from the release of computing power demand. We believe that with the upsurge of large-scale model training to be launched by ChatGPT, the demand for intelligent computing power represented by artificial intelligence training will be gradually released, which is expected to drive the increase of AI servers. From the perspective of dismantling the cost of AI servers, GPU and other computing power chips are the core components. Advanced computing power products are affected by US export controls, but they can be basically replaced by purchasing A800. We believe that domestic leading manufacturers such as Inspur Information occupy a major share of the global AI server market by virtue of their rich product matrix and strong product competitiveness, and are expected to fully benefit from the release of server demand in the future. In terms of dismantling, the main costs of AI servers include computing power chips, memory, storage, etc. According to IDC's 2018 server cost structure split data, chip costs account for about 32% of the total cost in basic servers, and in servers with high performance or stronger computing capabilities, chip-related costs can account for as much as 50%- 83%. Taking the machine learning AI server as an example, its main cost is composed of GPU, CPU, memory and other components, among which GPU cost accounts for the highest proportion, reaching 72.8%.

Click to enter a picture description (up to 30 characters)

AI server computing chips are mainly based on GPUs. According to IDC, in the domestic artificial intelligence chip market in 2022, GPU chips will occupy the main market share, reaching 89.0%. The main reason is that the GPU chip parallel computing architecture is more suitable for complex mathematical computing scenarios and can better support highly parallel workloads. Model training in the data center, and inference workloads on the edge and end. In addition, other major artificial intelligence chips include NPU, ASIC, FPGA, etc. Generally speaking, the number of computing power chips required in an AI server depends on the design performance requirements of the server, and the types of requirements depend on indicators such as cost, power consumption, and algorithms. Common computing chip combinations, such as 8x GPU+2x CPU, 4x GPU+2x CPU, 8x FPGA+1x CPU, 4x FPGA+1x CPU, etc.

Click to enter a picture description (up to 30 characters)

GPU structure: computing unit + video memory. Computing unit (Streaming Multiprocessor): The function of the computing unit is to perform calculations. Each SM has an independent control unit, registers, cache, and instruction pipeline. Video memory (Global Memory): Video memory is the DRAM on the GPU board, which has a large capacity but a slow speed.

1. The underlying architecture of the computing unit: Graphics card cores are composed of various components, and different cores focus on different tasks. Taking Nvidia as an example, the GPU graphics card consists of TENSOR CORE, CUDA and RT. TENSOR CORE, the tensor core, is a special area on Nvidia GPU. It is designed for AI matrix calculation and can significantly improve AI training throughput and inference performance. CUDA is a common structure in the Nvidia ecosystem. It generally includes multiple data types and is suitable for common image processing and computing tasks such as video production, image processing, and 3D rendering.

Click to enter a picture description (up to 30 characters)

2. TOPS and TFLOPS are common computing power measurement units:

1) OPS: OPS (Operations Per Second) refers to the number of operations performed per second, which is a unit of integer operations, and is often used to measure computing power performance under calculation precision such as INT8 and INT4. Among them, TOPS (Tera Operations Per Second) means that the processor can perform one trillion (10^12) operations per second, and similar units such as GOPS and MOPS represent the number of operations per second.

2) FLOPS: FLOPS (Floating-point Operations Per Second) refers to the number of floating-point operations performed per second. It is often used to measure computing power performance under single-precision (FP32) and half-precision (FP16) calculation precision. TFLOPS (Tera Floating-point Operations Per Second) means that the processor can perform one trillion (10^12) floating-point operations per second. Although TOPS and TFLOPS are of the same order of magnitude, the former measures the number of operations, while the latter measures floating-point operations. TOPS must be combined with data type precision (such as INT8, FP16, etc.) to convert with FLOPS.

3. Video memory bit width, bandwidth and capacity: The main indicators of video memory include bit width, bandwidth and capacity. The video memory itself is similar to the memory of the CPU, transferring data between the GPU core and the disk. The video memory bit width is the number of bits that the video card can transmit data in one clock cycle, which determines the amount of data that the video memory can transmit instantaneously. Video memory bandwidth refers to the data transfer rate between the display chip and the video memory, which is determined by the frequency of the video memory and the bandwidth of the video memory, reflecting the speed and performance of the video card. The video memory capacity determines how much data is temporarily stored in the video memory. Currently mainstream AI GPU chips include Nvidia H100, A100, and V100. Globally, the current AI GPU market for artificial intelligence training is dominated by Nvidia, and the company's advanced computing power products mainly include H100, A100 and V100. Compared with double-precision floating-point computing performance (FP64 Tensor Core), the computing speeds of H100, A100, and V100 are 67 TFLOPS, 19.5 TFLOPS, and 8.2 TFLOPS, respectively. In terms of video memory bandwidth, the transmission speeds of H100, A100, and V100 are 3TB/s, 2TB/s, and 900GB/s, respectively.

Click to enter a picture description (up to 30 characters)

Restricted imports of advanced computing power chips may be one of the bottlenecks for domestic AI servers.

On October 7, 2022, the U.S. Department of Commerce's Bureau of Industry and Security (BIS) announced a statement on new regulations targeting China's export of advanced chips. The statement stipulates that products that satisfy the input-output (I/O) two-way transmission speed higher than 600GB/s, and the total processing performance calculated by multiplying the bit length of each operation by TOPS is 4800 or more, will not be exported to China. Taking Nvidia A100 as an example, based on TF32 performance calculation, that is, 156*32=4992>4800, and the transmission speed is 600GB/s. Based on this, we can infer that advanced computing power chips with performance greater than or equal to A100 GPU fall within the scope of US export restrictions.

The use of Nvidia A800 servers may be a currently viable alternative. Take Inspur NF5688M6 as an example. NF5688M6 is an NVLink AI server developed for ultra-large-scale data centers. It supports 2 Intel’s latest Ice Lake CPUs and 8 NVIDIA’s latest NVSwitch fully interconnected A800GPUs. A single machine can provide 5PFlops of AI computing performance. Compared with the core hardware, the NF5688M6 adopts the Nvidia China special version chip—A800, which is basically the same as the advanced computing power chip—A100 in terms of floating point computing power, memory bandwidth, and memory capacity. The main difference lies in the data transmission of the chip. Speed, about two-thirds of A100.

Click to enter a picture description (up to 30 characters)

Nvidia's other AI GPU chips are not affected by export restrictions. Considering that the current restrictions on GPU chips in the United States are mainly concentrated in the field of advanced computing power, if the restrictions are further increased in the future, there may be a risk of further restrictions on chips with large computing power such as the A800. From the perspective of Nvidia's product line layout, in addition to the advanced computing chips such as A100, A800, V100, and H100 discussed above, there are also A2, A10, A30, A40, and T4. Among these chips, the model with the strongest floating-point computing capability is A30, and the output performance is 82*32=2624<4800, so it is not affected by export restrictions.

Click to enter a picture description (up to 30 characters)

The performance of domestic AI GPUs continues to upgrade, and domestic substitution is expected in the future. At present, domestic AI GPU manufacturers mainly include Ali, Huawei, Cambrian, Tianshu Zhixin, etc. As domestic manufacturers continue to strengthen GPU research and development, product strength continues to upgrade. Taking Huawei Shengteng 910 as an example, this chip uses a 7nm process and integrates more than 49.6 billion transistors. It can provide FP16 computing speed of 320TFLOPS or INT8 computing power of 640TOPS, which is slightly higher than the FP16 computing speed of Nvidia A100 (312TFLOPS, no using NVIDIA sparse technology). We believe that purely from the perspective of chip computing performance, some domestic chips have been able to catch up with overseas mainstream chips. With the gradual polishing of the domestic ecology, the performance improvement of GPU is expected to promote the substitution of localization.

Click to enter a picture description (up to 30 characters)

Summarize:

1. What kind of computing power does the GPT model need? ChatGPT adopts a single large model route. The demand for underlying computing power is mainly reflected in two levels of training and reasoning. Training means using a large number of data sets to perform repeated iterative calculations on the model, and reasoning means using the model to process input information and give results. .

According to IDC data, in 2021 China's artificial intelligence server workload, 57.6% of the load will be used for reasoning, and 42.4% will be used for model training. Specifically, computing power demand scenarios include pre-training, Finetune, and daily operations. According to our calculations, the pre-training computing power required by the GPT-3 175B model is about 3640 PFlop/s-day, the computing power required for ChatGPT’s single-month operation is about 7034.7 PFlop/s-day, and the computing power required by Finetune for a single month is at least 1350.4 PFlop/s-day.

2. What kind of server does the GPT model need? We believe that the driving force for the continuous evolution of server types comes from: changes in computing architecture. From the perspective of the development history of the server industry, with the computing architecture from stand-alone to CS and CES, server types such as PC, cloud computing, and edge computing have gradually evolved. In the era of AI training, the return of CS architecture and the demand for large-scale parallel computing have brought about the expansion of AI servers. Compared with traditional servers, AI servers are better at vector and tensor calculations due to the use of accelerator cards such as GPUs, and have stronger processing capabilities for AI training and inference scenarios, and adopt a multi-chip combination architecture, and the cost of a single server chip is also higher .

3. What kind of computing chip does the GPT model need? The training and inference calculation of the GPT model are mainly completed by the AI ​​server, and the underlying computing chips mainly include CPU, GPU, FPGA, ASIC, etc. Common computing chip combinations, such as 8x GPU+2x CPU, 4x GPU+2x CPU, 8x FPGA+1x CPU, 4x FPGA+1x CPU, etc. According to IDC, in the domestic artificial intelligence chip market in 2022, GPU chips will occupy the main market share, reaching 89.0%. Currently overseas mainstream AI GPU chips include Nvidia H100, A100 and V100.

4. The impact of the export restrictions on advanced computing power chips in the United States on the GPT industry? Affected by the U.S. policy of restricting the export of advanced computing power chips, China can only purchase AI GPUs with performance lower than A100, such as Nvidia A800 series. In addition, the lower-performance previous versions of Nvidia's A-series and T-series are still unaffected. Considering that some domestic AI GPUs, such as Huawei Ascend, have already accelerated to catch up with Nvidia A100 in terms of FP16 floating-point computing performance, in the future, with the improvement of domestic ecology, the domestic replacement of AI GPU is expected to accelerate. 5. Who are the related companies in the AI ​​server industry chain?

  • 1) Manufacturers who can purchase overseas high-performance chips: Inspur Information, etc.;

  • 2) Manufacturers using Haiguang/Cambrian chips: Sugon;

  • 3) Manufacturers using Huawei Ascend chips: Tuowei Information, etc.;

  • 4) Underlying chip suppliers: Haiguang Information, Cambrian, Jingjiawei, etc.

The above content is excerpted from Intelligent Computing Core World 2023-03-01. This article is reprinted by contacting the publisher. If there is any reprint, please contact the original source.

At present, the main brands of domestic servers are Inspur, Sugon, Huawei, Superfusion, H3C, Lenovo, Fenghu (research server Fenghu Information, Fenghu Yunlong), and there are many other brands, such as foreign brands HP, Dell, IBM, etc. There is still a large share in the country. In fact, the core components are the same for everyone, and it is more cost-effective to choose domestic ones.

illustrate:

1. Huawei and Super Fusion are already two companies. Super Fusion is mainly based on X86 architecture servers. Huawei is developing self-developed processor servers, mainly Kunpeng and Shengteng series;

2. Scientific research servers are often the basis of many applications, involving scientific research directions and a wide range of fields, especially with different software features and usage environments, requiring the team to have considerable professional experience. Scientific research servers are the precursor and foundation of various application scenarios.

 

 

Guess you like

Origin blog.csdn.net/Ai17316391579/article/details/129937479