Heterogeneous Computing--CUDA Architecture

1. What is CUDA?

  CUDA (Compute Unified Device Architecture), a computing platform launched by graphics card manufacturer NVIDIA, is a general-purpose parallel computing architecture that enables GPUs to solve complex computing problems. To put it bluntly, we can use the GPU to parallelize programs that run hard on the CPU, such as neural networks and image processing algorithms. With GPUs and high parallelism, we can greatly increase the speed at which these algorithms run.

2. CPU & CUDA architecture

  The processor structure has two indicators that should be considered frequently: delay and throughput. Latency refers to the time interval between issuing an instruction and returning the final result; throughput refers to the number of instructions processed per unit time. Since the main task of the CPU is to process calculation and control, the design concept is a delay-oriented kernel; since the main task of the GPU is parallel processing, the design concept is a throughput-oriented kernel.

(1)CPU

  CPU (Central Processing Unit) central processing unit is a very large-scale integrated circuit. The main logic structure includes the control unit Control, the operation unit ALU and the cache memory (Cache) and the data (Data), control and state that realize the connection between them. The bus (Bus). Simply put, it is the computing unit, control unit and storage unit.
The architecture diagram is as follows:
insert image description here

  The CPU follows the von Neumann architecture, and its core is to store programs/data and execute them serially. Therefore, the CPU architecture needs a lot of space to place the storage unit (Cache) and the control unit (Control), compared with the calculation unit (ALU) which only occupies a small part, so the CPU is affected by large-scale parallel computing. Restrictions are relatively better at handling logic control. The CPU cannot achieve the ability of parallel computing of a large amount of data, but the GPU can.

(2)GPU

  GPU (Graphics Processing Unit), that is, a graphics processor, is a large-scale parallel computing architecture composed of a large number of computing units. It was previously separated from the CPU to process image parallel computing data. It is designed for processing multiple parallel computing tasks at the same time. design. The GPU also includes basic computing units, control units, and storage units, but the architecture of the GPU is very different from that of the CPU, and its architecture diagram is shown below.
insert image description here

  Compared with CPU, less than 20% of CPU chip space is ALU, while more than 80% of GPU chip space is ALU. That is, the GPU has more ALUs for data parallel processing. This is why GPU can have powerful parallel computing capability.
From the perspective of hardware architecture analysis, CPU and GPU seem to be very similar. Both have memory, Cache, ALU, and CU, and both have many cores. Very complex control logic, branch prediction, out-of-order execution, multi-stage pipeline tasks, etc. Relatively speaking, the core of the GPU is relatively light, and is used to optimize data parallel tasks with simple control logic, focusing on the throughput of parallel programs.
  To put it simply, the core of the CPU is good at completing multiple complex tasks, focusing on logic and serial programs; the core of the GPU is good at completing tasks with simple control logic, focusing on calculation and parallelism.

3. Heterogeneous Computing

  The so-called heterogeneous computing refers to the collaborative computing of CPU+GPU or CPU+other devices (such as FPGA, etc.). Generally, our programs are calculated on the CPU. However, when a large amount of data needs to be calculated, the CPU appears to be powerless. So, can we find other ways to solve the calculation speed? That is heterogeneous computing. For example, the computing power of computing devices such as CPU (Central Processing Unit), GPU (Graphic Processing Unit), and even APU (Accelerated Processing Units, fusion of CPU and GPU) can be used to increase the speed of the system. Heterogeneous systems are becoming more common, and computing to support such environments is receiving increasing attention.
  At present, the most widely used heterogeneous computing is to use GPU to accelerate. All mainstream GPUs use a unified architecture unit. With a powerful lineup of programmable stream processors, GPUs are far behind CPUs in terms of single-precision floating-point operations. As shown in the figure, it is a schematic diagram of CPU+GPU heterogeneous computing, in which GPU is mainly responsible for parallel computing.
insert image description here

4. The relationship between OpenCL and CUDA

  NVIDIA's CUDA architecture does not conflict with KHRONOS's OpenCL. The relationship between them is the relationship between the API and the execution architecture. To give a simple example: the X86 architecture we are familiar with is a CPU architecture, and various programming languages , Such as: assembly language, C language and other low-level or high-level languages ​​are only a programming environment based on the X86 computing architecture. Well, the relationship between the CUDA architecture and OpenCL is the same as the relationship between X86 and programming languages.
  The CUDA architecture is one of the operating platforms of OpenCL, so there is no relationship between them who replaces whom. OpenCL just provides a programmable API for the CUDA architecture.
insert image description here
Refer to the original text
[1] http://t.zoukankan.com/liuyufei-p-13259264.html

Guess you like

Origin blog.csdn.net/qq_44924694/article/details/126202388