Nvidia A800 will also be banned? Are domestic GPU manufacturers ready?

8a3979056a89e82b59f5cecd66dfee11.png

64628b164d640620a4a7ff0d809fb40d.png

adc90ec7479f67a264f7e9a7eceb25e6.png




‍Data intelligence industry innovation service media

——Focus on digital intelligence and change business


According to the Wall Street Journal, the United States is considering further tightening export controls on Chinese AI chips, and may take action as early as early July.

Without a license, the U.S. Department of Commerce will prohibit manufacturers such as Nvidia from shipping AI chips to Chinese customers. Nvidia's A800 chip exclusively for China will also be banned from sale without a license.

A800 is a graphics card chip specially for the Chinese version launched by Nvidia in the third quarter of 2022. In September last year, the United States prohibited companies such as Nvidia from exporting high-end GPU chips to China, mainly restricting the computing power and bandwidth of graphics cards. The upper limit of computing power is 4800 TFLOPS, and the upper limit of bandwidth is 600 GB/s. As soon as the ban came out, A100 and H100 missed the Chinese market.

In fact, according to Nvidia's 2022 financial report, sales revenue in China accounts for more than 20% of its total revenue, which is very important to it. In order to meet the requirements of the ban and take into account the Chinese market, Nvidia has launched special graphics card A800 and H800 for the Chinese version.

According to MyDrivers, the A800 runs at 70% as fast as the A100 GPU. In addition, the H800, which was launched later, has halved the interconnection rate compared to the H100, which meets the US export standards and is specially designed for the Chinese market. 

Since the end of 2022, driven by ChatGPT, both Internet platform companies and AI companies have laid out large models. According to incomplete statistics, 79 large-scale models with a parameter scale of more than 1 billion have been released in China. Behind the normal operation of every large model, computing power is needed. High-end GPUs that are equal to computing power are all sold out. Affected by this, the market value of Nvidia directly soared to more than one trillion US dollars.

In order to obtain higher computing power, promote the speed of large-scale model training, and seize development opportunities, an arms race on GPU is being staged at home and abroad.

Under the huge demand, even the "castrated version" A800 and H800 are hard to find. According to the previous interface news, before the hot release of the large model, the A800 can arrive in two weeks, but now it may take 4-8 weeks. The report also mentioned that now Nvidia has also learned from Hermès. When purchasing popular GPUs, it also needs to purchase other products as "distribution" to obtain priority supply rights. If the export ban is further escalated, the A800 and H800 may be cut off from the Chinese market.

Large models require computing power, and GPU is the source of computing power for AI training. For China, the AI ​​industry has entered a new competition point with the blessing of large models. The loss of high-computing chips is extremely unfavorable to the development of my country's AI industry, and it is imperative to realize domestic substitution of GPUs.

Looking at domestic GPUs from the rise of Nvidia

In the beginning, GPU, as a processor for accelerating graphics drawing, was first proposed by Nvidia when it released the GeForce 256 graphics processing chip in 1999. At that time, GPU was mainly aimed at the game and PC markets.

Since the GPU can efficiently process massive amounts of data in parallel when processing graphics tasks, Nvidia has further simulated the computer program as a rendering process, used the GPU for general parallel computing, and launched the CUDA-based GPGPU beta in 2007.

Compared with graphics rendering GPUs, GPGPU can perform multiple computing tasks at the same time, thereby greatly improving computing speed and efficiency.

In the field of AI, many AI algorithms need to process a large amount of data, and the amount of calculation is huge. For example, deep learning algorithms under machine learning need to process a large number of matrix operations. It may take weeks or even months to train a deep learning model on a traditional CPU, but it only takes hours or days to complete it on a GPU, thus significantly Improve training speed and reduce costs.

Since Nvidia launched GPGPU, GPU is no longer limited to the game and PC market of graphics computing, and then expands its territory in the field of AI. For more than ten years, Nvidia GPU products have been ahead of CPUs by more than 10 times or even 100 times in terms of computing power and storage bandwidth. The development of artificial intelligence and GPU can be said to complement each other.

We know that the development of GPU conforms to Moore's Law, that is, the number of crystals that can be accommodated on the body circuit will double every 18 months, thereby doubling the performance. The hardware performance of Nvidia GPU is kept updated every two years, which improves the performance of Nvidia GPU dozens of times and occupies the commanding heights of independent graphics card technology.

Up to now, Nvidia GPU is neck and neck with AMD in the field of graphics rendering; but in the field of general-purpose GPU, Nvidia is the best, with a market share of over 80%. Nowadays, with the rise of large-scale model research, both large-scale model training and reasoning are inseparable from GPGPU chips to provide computing power support, and global large-scale model training basically relies on Nvidia's GPU.

Back in China, if the export ban is upgraded, can domestic GPUs stand alone?

At present, Nvidia GPU A100, which has the greatest demand for AI, adopts 7nm process, has 54 billion transistors, and supports FP16, FP32 and FP64 floating-point operations. The H100, launched in March 2022, integrates 80 billion transistors and uses a 4nm process. Compared with the A100, the H100 is three times faster in floating-point operations.

From the perspective of hardware, on the GPU chip design side, domestic GPU manufacturers such as Zhongbiren, Tianshu Zhixin, and Muxi have all launched GPUs using 7nm technology, and companies such as Xintong Semiconductor, Innosilicon, and Moore Thread have also launched GPUs one after another. product.

In addition, the original CPU manufacturers Loongson, Haiguang, etc. are also adding GPGPU, but Loongson’s GPGPU is mainly integrated in its own SOC, and Loongson is expected to tap out in 2024. The DCU launched by Haiguang Information is a kind of GPGPU.

95ea00e36c8c26c298aa6ceb699cb97a.png
Source: Data Ape, tabulated from public information

Among domestic GPU manufacturers, Jingjiawei has successfully developed graphics processing chips with independent intellectual property rights represented by JM5 series, JM7 series and JM9 series, and realized large-scale commercial use. It is in a leading position in the domestic GPU field and is The only company in China with GPU revenue exceeding 1 billion yuan, with a net profit of nearly 300 million yuan in 2022.

Nvidia’s operating income in the first quarter of fiscal year 2024 was US$7.192 billion (equivalent to approximately RMB 50.782 billion), and its net profit was US$2.043 billion (equivalent to approximately RMB 14.425 billion. In contrast, Jingjiawei’s annual revenue is only one of Nvidia’s. 1% of quarterly revenue, not to mention the gap of other GPU start-ups.

"The design capabilities of Chinese GPU manufacturers are not bad, but the capabilities of the overall industry chain are still 5-10 years behind." Yang Xiaodong, director of computing technology at Huayuan, said in an interview. "It can be seen that in recent years, some GPU start-up companies have been able to design a GPU with good performance after only 2-3 years of establishment. There are many links involved. A tape-out may not be successful, even if the tape-out is successful, there is still a long way to go before large-scale mass production and cost control.”

Taking chip foundry as an example, TSMC is the global leader in foundry, and has achieved 3nm mass production and 2nm trial production in terms of advanced manufacturing processes. Domestic SMIC is currently able to achieve 14nm mass production and 7nm is under research, which is at least three generations behind TSMC.

At this stage, the 7nm chips designed by domestic GPU manufacturers are still unable to provide tape-out services due to technical problems in domestic foundries, so GPU manufacturers will most likely choose TSMC current chips with advanced technology and mature technology. According to Chen Fei, an industry insider, start-up companies looking for TSMC current chips not only need high foundry fees (200-300 million yuan), but also need to find the materials needed for tape-out, and then bring the materials to find TSMC current chips...

From design to mass production of a chip, tape-out is a very critical link. When the chip is fully designed, it needs to be etched on the wafer according to the drawings. What kind of process technology is used, what size wafer is used, and the complexity of the chip will affect the success rate and cost of the chip, and many chips are It is not possible to successfully tape out once, and it often takes multiple tape-outs to obtain a more ideal effect. Taping out is a very expensive thing to do, and a few more failed tapings may bring down the company.

For start-up GPU manufacturers, it is a long way from product design to implementation, and because of Moore's Law, the performance doubles in two years, so they have to speed up. But when they overcome the hardware problem, they will find that "software ecology" is the fundamental reason why domestic GPUs cannot shake Nvidia.

CUDA is Nvidia's deepest moat

Software, algorithm and ecology are the soft power for GPU manufacturers to compete, and they are also the key to a chip from "usable" to "easy to use".

Generally speaking, the GPU ecosystem is basically composed of software and needs to be completed on an algorithm platform. Based on the platform algorithm to adapt to various API interfaces, downstream applications and various functions required by it. Although the gap in hardware performance of domestic GPUs is gradually narrowing, the gap in software ecology is huge. In the interview, several GPU manufacturers mentioned that CUDA is Nvidia's deepest moat.

What exactly is CUDA?

CUDA (Computing Unified Device Architecture) is a programming model and application programming interface for high-performance computing. It provides a simple and efficient way to access the computing power of the GPU, allowing developers to easily write cross-platform GPU programs.

To put it simply, CUDA is Nvidia's exclusive parallel computing platform. Through the API interface provided by CUDA, application development is done, and the computing power of Nvidia GPU is called, so that developers can build software for the parallel processing capability of GPU.

"For the application manufacturers downstream of the chip, they may not care about how much memory the GPU is equipped with and what hardware architecture it uses. What they care about is what kind of performance can be run on the GPU hardware. This is actually a software It is determined by the level." Fade Chen said.

For GPUs that follow Moore's Law, new products will be launched in less than two years, otherwise they will fall behind in the competition. Therefore, it is difficult for Nvidia to get rid of its opponents in terms of hardware.

But after the release of the CUDA platform, everything changed. All Nvidia chip designs are compatible with CUDA, and building software on the CUDA platform can give full play to 100% performance of Nvidia GPUs, so all Nvidia GPU users need to use CUDA, thereby cultivating user habits.

In the past two or three years, domestic substitution of GPUs has become popular. Each GPU hardware has its own advantages, and some data accuracy can compete with Nvidia, but the software ecosystem is the only shortcoming. For this, Fade Chen said that "software development takes time, and it is necessary to continuously increase users to promote software iterations and improve software development according to the actual needs of users. Moreover, software and hardware collaborative development is required. This is the first research and development of domestic GPU start-ups . It is the easiest to ignore in the first generation of products.”

Since NVIDIA launched CUDA in 2006, NVIDIA has spent nearly two decades building the CUDA software platform. "First of all, it is not open source, and all IP is in the hands of Nvidia. Secondly, it brings together 4 million developers, and these users give feedback every day, thus forming a virtuous circle like a 'snowball': good performance It brings a good ecology, and a good ecology contributes to better performance, thus forming an ecological barrier." Fade Chen said.

At present, almost all manufacturers of AI-related application development are writing codes based on the CUDA platform. All AI chips and various xPUs, before landing, the first thing to do is to match CUDA.

Regarding the ecological construction of domestic GPUs, Yang Xiaodong, Director of Computing Technology of Huayuan, believes: "It is not enough to only sell GPU hardware. In order to make good use of GPUs, it is necessary to improve a series of ecological support, such as drivers, software, etc. If the software framework cannot support it, then everyone I can’t use it if I want to. At present, the domestic GPU has not yet reached the level of full marketization, and it is in the early stage of market development, and there are reasons for both software and hardware.”

It must be admitted that the success of Nvidia's CUDA ecological construction did not happen overnight. It took more than ten years of long accumulation to form a situation of "gathering sand into a tower, and gathering water into a trickle". A huge user base needs to be developed slowly. In this regard, Zou Wei, president of Tianshu Zhixin product line, believes that compared with international mainstream products, there is a gap between domestic GPUs at the flagship level; and domestic customers may not know enough about domestic GPU products, and it will take time to understand. Cultivate customers.

Domestic general-purpose GPU from 0 to 1, commercially available

But it is worth affirming that the domestic general-purpose GPU has opened up the situation.

Today, the global GPU market has formed a monopoly situation. Compared with graphics rendering GPUs, general-purpose GPUs seem to have a broader development prospect.

" The most important point is that AI is a growing blue ocean market. " Yang Xiaodong said when talking about the phenomenon that domestic general-purpose GPUs are developing more rapidly than rendering GPUs. "The potential of the AI ​​market is large enough, and it is impossible for Nvidia to eat all of it. As Nvidia withdraws from the domestic market, domestic manufacturers can try to eat part of Nvidia's cake. From the trend, domestic GPGPU develops faster and more lively."

Tianshu Zhixin, which started chip design in 2018, aimed at the versatility of general-purpose GPU products and a wide range of application scenarios in the AI ​​​​field at the beginning of the development of the first product. Zou Wei, president of SciCore product line, told Data Ape: " From the perspective of the overall status of domestic GPUs, there is still a certain gap with foreign giants, but after the refinement of the AI ​​​​market in the past few years, domestic GPGPUs have been realized from 0 To the breakthrough of 1', reach the usable level. In the future, we will always adhere to the general-purpose GPU strategy, tap the general-purpose GPU market and launch high-quality products tailored to the market and user needs, and use the first-mover advantage to further popularize market applications based on user feedback. Accelerate product iterative upgrades."

At present, Tianshu Zhixin has two general-purpose GPU products, Tiangai and Zhikai, which can support some needs of current users and have a wide range of downstream application scenarios. It can be applied to scenarios such as training, reasoning, general computing, and new algorithm research, and serves various related industries such as the Internet, security, operators, medical care, education, finance, and autonomous driving.

In addition to Tianshu Zhixin, among domestic GPU manufacturers, Biren, which was established in 2019, and the first-generation GPGPU product Biren series have been mass-produced by the end of 2022, and some orders have been obtained. Denglin Technology's general-purpose GPU series product—Goldwasser™ has also been mass-produced. It has previously joined the "Hardware Ecological Co-creation Plan" initiated by Flying Paddle. Through their respective advantages in software and hardware products, they have achieved a strong alliance and jointly promoted the industrialization of AI. landing.

Refining AI large models must pay "GPU tax"

In 2023, ChatGPT is rising rapidly, and an AI heat wave is sweeping the world.

One of the important criteria for considering the model is the large parameter. When the model scale reaches a certain level, there will be obvious mutations in the task performance. The foundation of the large language model has strong scalability and can realize repeated self-iteration, and the parameters play a significant role in the performance of the large model. The more parameters, the more computing power resources are consumed.

As a result, the emergence of large models has brought about incremental demand for computing power. According to Verified Market Research data, in 2020, the global GPU market size will be 25.41 billion US dollars (about 171.72 billion yuan). With the continuous growth of demand, it is estimated that by 2028, this figure will reach 246.51 billion US dollars (about 1.67 trillion yuan), with a compound annual growth rate (CAGR) of 32.82%.

275de0b9158bf117ae8c4e98d3b03052.png

In the paper jointly released by NVIDIA, the empirical formula of ChatGPT training time is given. In this paper, it takes 34 days to train 175B GPT-3, using 1024 A100 GPUs.

Hoarding A100 is equivalent to hoarding computing power. At present, the official price of A100 is 10,000 US dollars, which is about 72,000 yuan when converted into RMB. Now it has been fired to 150,000-200,000 yuan per piece. It is conceivable that the market demand for general-purpose GPUs is strong and the market is vast.

In recent years, in the two-wheel drive of policy and demand, domestic GPU start-ups that "share the country's worries" have emerged one after another.

According to incomplete statistics, domestic GPU financing will usher in a peak in 2021, with the total financing amount exceeding 10 billion, reaching 12.635 billion yuan. Even if the total amount of financing in 2022 is "halved", it still ranks second in the financing performance of the past eight full years. 2020-2022 is a big year for GPU investment and financing.

f929b3daff24e9edc0636be489ab7783.png
Cartography: Data Ape, Open Data Statistics

a54645952cd36c943dcfdb78437a37b3.png
Tabulation: Data Ape, Open Data Statistics

Many start-ups that have been established for 2-3 years have received multiple rounds of huge financing, but if compared with GPU chip development itself, it seems that there are not many. Chen Fei revealed that the overall cost of a GPU from design to official implementation is about 2 billion yuan, making a GPU is a very "money-burning" thing.

In addition to "burning money", it will take about one and a half to two years or even longer for GPU products to go from development to tape-out to anti-film tuning, and then to the official release.

For start-ups, the hardware level needs to cross the industrial chain and cost problems; the software level needs to cultivate customers, software and hardware synergy... The road to domestic GPU is destined to be difficult, but the difficult road is the uphill road.

Talking about the future of domestic GPUs, Zou Wei believes that with the passage of time, the performance of domestic GPUs will continue to rise, and applications will fully blossom, and it is expected to catch up in 5 to 10 years. Application landing is the best "experimental field" to improve the strength of domestic GPU. On the one hand, it takes technology and time to accumulate GPU from usable to easy to use, and it needs to accumulate word of mouth and expand brand effect. On the other hand, domestic GPUs can focus on the implementation of applications, increase cooperation with customers, and gradually expand the territory after opening up the situation.

(Note: Fade Chen is a pseudonym in the article)

Text: Mu Yang  /  Data Ape

122b2a5b298be25500a84238ad5a31ba.jpeg

17dbe1f5fba7b1a3c356a7f908246fea.jpeg

af11d7fb91099dbe182945c64d347e64.png

799353b1c61106396a882e04f9a10d78.png

Guess you like

Origin blog.csdn.net/YMPzUELX3AIAp7Q/article/details/131587623