Learning computer graphics (1)

I took my first computer graphics class today and took some study notes. I welcome criticism and corrections.

1.CPU and GPU

The general CPU has four cores and eight threads, corresponding to the four ALUs in the figure, while the GPU has multiple arithmetic logic units, each unit is understood as a "thread", and the numerous threads ensure the advantages of parallel computing of the GPU, and CPU is more suitable for operations with complex logical operations. CPU is good at logic control and is serial computing, while GPU is good at high-intensity calculation and is parallel computing.

GPU is divided into integrated graphics and independent graphics. The integrated graphics card is integrated on the motherboard and shares the main memory, while the independent graphics card has its own graphics memory on the board, which will reduce the time spent on data communication.

2. Thread architecture

(1) Knowledge points
Host : host
Kernel: Kernel program ( code written by C programmers that can be parallelized)
PTX (Parallel Thread Execution): Parallel thread execution program
CTA (Cooperative Thread Arrays): Execute multiple threads of the same core in parallel or simultaneously (1D, 2D, 3D )
Grid: Combine CTAs executing the same core program into a Grid
Note: The threads in CTA execute the same core program, and the CTA in Grid is executed independently.
Why need Grid?
~ There will be many threads that can be called by the same core program
~  Reduce communication and synchronization between threads (because threads between different CTAs cannot communicate and synchronize with each other)
(2) Understand yourself
~ A kernel is composed of multiple programming statements, that is, multiple parallel calculations
~ A Grid corresponds to a kernel
~ Each CTA (block) in a Grid corresponds to a parallel computing statement
~ CTA in hardware, and block in software level in CUDA
~ A CTA consists of multiple threads (can be understood as ALU), but CTA is the basic unit

3. CUDA and CUDNN

(1) CUDA is a parallel computing framework launched by NVIDIA for its own GPUs. This means that CUDA can only run on NVIDIA GPUs, and can only play its role when the computing problem to be solved can be calculated in large quantities in parallel. .
(2) CUDNN (CUDA Deep Neural Network library): It is an acceleration library for deep neural networks created by NVIDIA. It is a GPU acceleration library for deep neural networks. It can optimize the calculation of model training and then call the GPU for calculation through CUDA. If you want to use GPU to train the model, CUDNN is not necessary, which means you can also use CUDA directly instead of CUDNN, but the computing efficiency will be much lower.

4. CUDA programming

understand:

(1) <<<...>>> : CUDA syntax, indicating how many CUDA threads the kernel is called

(2)

~ threadIdx is a uint3 type, representing the index of a thread.

~ blockIdx is a uint3 type, representing the index of a thread block. There are usually multiple threads in a thread block.

~ blockDim is a dim3 type, indicating the size of the thread block.

~ gridDim is a dim3 type, indicating the size of the grid. There are usually multiple thread blocks in a grid.

5. Stream processor

  1.  GPUs revolve around a series of streaming multiprocessors (SMs). Multi-threaded programs are divided into thread blocks, which are independent of each other. A GPU with more multi-processors will use less time to execute the same program.
  2. A coprocessor is a processor developed and applied to assist the central processor in completing processing tasks that it cannot perform or performs efficiently and ineffectively. There are many tasks that this kind of central processing unit cannot perform, such as signal transmission between devices, management of access devices, etc.; and those with low execution efficiency and effects include graphics processing, audio processing, etc. In order to perform these processes, various auxiliary processors were born. It should be noted that since the integer arithmetic unit and the floating-point arithmetic unit have been integrated in today's computers, the floating-point processor is no longer an auxiliary processor. The coprocessor built into the CPU is also not an auxiliary processor unless it exists independently.

6. Storage structure

 

  1. Each thread has its own private storage, and each thread can have multiple private storages, which can be understood as multiple registers.
  2. The same CTA has a shared storage visible to all threads in the CTA , and its life cycle is released after the CTA is executed.
  3. All threads in the same block can access the same global storage.
  4. All threads in the same grid can access the same global storage.
  5. Additional storage for constants, textures, and surface storage is accessible to all threads. ( Constant and texture storage are read-only; surface storage is read-write. )
  6. Texture can be understood as a small piece of image, which is copied and tiled on the picture multiple times. Pictures are attached to the surface of objects to show the texture of the object. Texture is the pattern, which is the arrangement of colors and transparency.

      

 The thread, CTA, block, and grid mentioned above are all structures in the virtual machine. The actual physical structure is as shown below. Usually one block runs on an sm. When the device memory is sufficient, multiple blocks can run on the same sm. .

      

 

Guess you like

Origin blog.csdn.net/m0_46749624/article/details/123194228