AI Acceleration (6) | Heterogeneous programming - performance is not enough, "plug-in" to make up?

This is the sixth article of an easy-to-understand entry-level AI acceleration popular science article. This article briefly introduces a concept - heterogeneous programming.

Previous AI Acceleration (5) | An example to understand the flow of water - from instructions to algorithms Use a small example in life to introduce the concept of flow of water. In the case of limited computing resources, we can improve program performance through software pipelining technology.

But if you are a local tyrant and don't want to spend too much energy on software optimization, but want to spend money to improve program performance, is there a way?

Of course there is, the performance is not enough, and the chip comes together. As the saying goes, "everyone gathers firewood to make a high flame", as long as there are enough chips, the performance can soar to the top.

Heterogeneous chip programming is one such way.

heterogeneous programming

The so-called heterogeneous programming is to put chips from different manufacturers and different architectures in a unified computer system, and implement AI computing through software scheduling.

For example, put the x86 CPU and NVIDIA GPU together for programming.

The development of artificial intelligence has given birth to the enthusiasm for heterogeneous programming. The main reason is that there are a lot of dense calculations in the neural network. If a traditional CPU is used to train a neural network from the beginning to the end, it will take a lot of computing power to calculate it. Even if the CPU is exhausted, it may not be calculated.

In neural networks, two dense operations, matrix and convolution operations, account for almost 90% of the time-consuming of most neural networks.

Therefore, many companies develop dedicated AI chips (also known as ASIC chips, Application Specific Integrated Circuit, application-specific integrated circuits), and design hardware units around convolution operations or matrix operations to complete operation acceleration.

It's like making a plug-in for the CPU.

Nvidia's GPU, Google's NPU, Cambrian's MLU, etc. are all similar ASIC chips. They are connected to the host through the PCIe bus, so as to complete the acceleration of AI as a co-processing unit.

This is the same reason as we expand computer memory.

  • The memory is not enough, buy a memory stick and plug it in, the memory is enough.

  • The computing power is not enough, buy a graphics card and plug it in, the computing power is enough.

It is precisely because of artificial intelligence's demand for computing power that more and more AI chip companies have emerged, and the technology of heterogeneous programming has become more and more known.

Among them, Nvidia's CUDA programming is the most well-known heterogeneous programming method.

Is Heterogeneous Programming Difficult?

Unfamiliar students may have some doubts at this time, is heterogeneous programming difficult?

First of all, it is not difficult.

Because most manufacturers of ASIC chips will provide drivers or computing libraries required for heterogeneous programming.

For example, Nvidia has a cuDriver library for driving graphics cards, a cuDNN library for accelerating common deep learning algorithms, a cuFFT library for FFT-related operations (algorithms are written in them, and we can call them ourselves), and TensorRT specializes in It is used for optimization of neural network reasoning and network fusion, etc.

No matter how bad it is, if you have an uncommon algorithm in your neural network, NVIDIA also provides CUDA programming, which is a C-like programming language, and you only need to learn some simple identifiers and kernel functions (kernel) By writing, you can write code that far exceeds the performance of the CPU.

There are a lot of information about CUDA programming on the Internet, and interested students can find it by themselves.

Except for Nvidia, other manufacturers basically follow this routine. Externally provide acceleration libraries and drivers to assist in the completion of heterogeneous programming, so as to achieve AI computing acceleration.

An interview related to heterogeneous programming

I remember one time I went to a company for an interview, and the interviewer told me that their company is based on vision to do intelligent transportation solutions. The so-called solution is to provide their customers with a complete set of software + hardware products and sell them in packages.

I was curious at the time, do you also make your own chips?

The interviewer said, we don't do it ourselves, but we will buy it. We will buy chip products developed by domestic AI chip companies, and of course GPUs, and then do secondary development to deploy our own algorithms on these hardware.

In your solution, is the hardware sold to users the product of only one company, or will multiple companies use it together?

Interviewer: There may be more than one, it depends on whose performance is easy to use.

...

This solution company will use chips from different manufacturers, but the core AI algorithm is its own.

This is a typical heterogeneous programming scenario: multiple AI accelerator cards are connected to the server host through the PCIe bus to achieve AI algorithm calculation acceleration, and the recognition of passers-by and vehicles in traffic scenarios is realized on the cloud.

Summarize

Heterogeneous programming can be thought of as a plug-in method that uses dedicated chips to accelerate neural networks. Through this dedicated accelerator card, the accelerated operation of related algorithms in the neural network is completed.

In fact, heterogeneous programming is not a very new concept.

According to a friend who is engaged in mobile phone development, they made mobile phones a long time ago. There will be many different chips in the mobile phone system. There will be communication between the main processor and the co-processor. Some algorithms run on the main processor, and some The algorithm runs on the coprocessor, and finally completes a whole operation.

This is a kind of heterogeneous programming, but they took it for granted at the time. With the upsurge of artificial intelligence, the concept of heterogeneous programming is becoming more and more familiar.

Thus, it has become an indispensable programming method in AI acceleration.

Well, that's all for heterogeneous programming. Welcome to click on the column to view all articles.

Guess you like

Origin blog.csdn.net/dongtuoc/article/details/127455390