cuda shader

Nvidia CUDA (Compute Unified Device Architecture) is a dedicated parallel computing platform and programming model. Each contains the Nvidia GPU CUDA cores. CUDA GPU so that people can use in many computing cores to perform general-purpose mathematical calculations.
In most cases, a unified shader architecture hardware computing unit and a series of dynamic scheduling some form / load balancing system components to ensure that all the computing unit work as often as possible to maintain.

NVIDIA GPU equivalent to "CPU core" (as defined in the OpenCL calculating means) is streaming multiprocessor SM. Each stream contains a multiprocessor vector unit 8 stream processors. NVIDIA SP is called "CUDA core", although since these are SM SIMD architecture and quite misleading.
cuda shader

GTX 260 has 24 such as SM, SM has on each of eight chips SP, SP total of 192, 192 is referred to as the NVIDIA cuda cores. From the viewpoint of OpenCL, from the point of view of the number of SM, 24 computing unit.
"Cuda core" is a vector ALU unit inside.

NVIDIA single SP referred to a single processing core. It has two ALU and a fully-pipelined architecture of the FPU, the microprocessor is ordered for a single problem. SP does not have any cache, most of the time is spent on processing pixel or vertex data, so in addition to handle a large number of mathematical operations, but it is not particularly good in any other respect.
cuda shader

CUDA kernel is similar to a parallel processor computer processor, it may be a dual or quad core processors. However, Nvidia GPU may have thousands of cores. The kernel is responsible for a variety of tasks, which allows the kernel directly related to the number of GPU speed and functionality.
Since CUDA GPU kernel handles all data transmitted through, so when loading the characters and scenery, these cores can deal with things like graphics.

Calculating means is a GPU core, or instead cuda shader core. Available openCL query. So-called colored or only part of the GPU core CUDA kernels, it can not operate independently, but on the GPU core operation. For example, an array to another array multiplied, and it receives the GPU core elements are provided to each divided shader (CUDA kernel) to perform work. Therefore, the shader (the CUDA core) only a part of the processor.
GPU composed by a stack of processor cores, the processor cores calculating unit referred to in the computer terminology. Each core / shader computation unit has a pile, TMU and ROP.
The GPU, shader contains a core is called cuda core, core Cuda shaders, a shader is a stream processor.
Since the use of GPU unified architecture, GPU core contains more versatile unit. These units are often referred to as a shader. Each shader substantially like some large portion of the microprocessor in the processor (e.g., the SSE unit x86 CPU) as to perform certain tasks, but not all.
GPU with the kernel, which itself has a number of shaders. Each GPU core contains a decoder for decoding the instruction of the kernel itself. The work itself is then transmitted to the workloads of the different core units. Each core contains shader GPU (NVIDIA cuda called core, AMD processors called stream), TMU, ROP, and in certain other further includes a GPU units. Each unit has its contribution to the graphics, and therefore have more specific unit at a specific set of graphics GPU faster.


How shader mapped to the actual GPU hardware?
Whether there is a one to one relationship between the core and GPU shader programs? So, vertex shader program is running on one core, while the fragment shader runs on the other core? Then, data is transferred from the core to the vertex shader core fragment shader? Or each core on the GPU and shaders are all responsible for the entire graphics pipeline?
The exact relationship depends on the card and drivers. Shader program from the general form (e.g. the DirectX or OpenGL) is converted to the form of the card can be run directly, using similar time compilation of the bytecode such as Java language.

Therefore, this relationship depends on the nature of the program and the card. If the program is large and complex, the card may need to assign a plurality of cores, a single core is likely to run many instances shader processors on the plurality of streams.

Modern card is dynamically assigned work, there are few 1: 1 relationship. Each core has a plurality of stream processors, so that if they are not too complex, it may be processed at the same time a plurality of shader.

Guess you like

Origin blog.51cto.com/1960961732/2444607