GPU architecture

Author: Saury Hanzo(original post address)

A graphics processing unit (GPU) is hardware specifically designed to handle graphics and parallel computing. Different GPU manufacturers use different architectures. Here are some major GPU architectures:

1.NVIDIA GPU architecture:

Fermi architecture:

NVIDIA's Fermi architecture is one of the early GPU architectures. It was first launched in 2010 and is mainly used in the GeForce and Tesla series of graphics processing units (GPUs). Here are some of the key features of Fermi architecture:

  • CUDA architecture:Fermi introduces the CUDA (Compute Unified Device Architecture) architecture, which allows the GPU to perform general computing, not just graphics rendering.

  • Double Precision support:Fermi is the first architecture to introduce double precision (Double Precision) floating point computing in NVIDIA architecture, making GPU more competitive in the fields of scientific computing and high-performance computing force.

  • Hardware texture sampling:The Fermi architecture supports hardware texture sampling, improving performance for graphics rendering and compute-intensive tasks.

  • ECC memory support:For the Tesla series, Fermi has introduced ECC (Error-Correcting Code) memory support to increase memory error detection and correction functions and improve system stability.

  • Parallel Thread Execution:Fermi supports the Parallel Thread Execution (PTX) model, allowing programmers to write CUDA programs in high-level languages ​​and execute them on the GPU.

  • Multiprocessor structure:Fermi architecture adopts a multiprocessor (Multiprocessor) structure. Each multiprocessor contains multiple CUDA cores to support parallel computing tasks.

  • Shared memory and L1 cache:Fermi introduces shared memory and L1 cache, which improves the efficiency of data access and is of great significance for some computing-intensive tasks.

  • Compute Capability:The Compute Capability of the Fermi architecture is 2.x. Different GPU models have different specific computing capabilities.

Some representative NVIDIA graphics cards use the Fermi architecture, such as the GeForce 400/500 series and Tesla 20 series. Although Fermi is an early GPU architecture, it laid the foundation for NVIDIA's subsequent architecture and made important progress in the fields of scientific computing and GPU computing.

Kepler architecture:

NVIDIA's Kepler architecture is a generation of GPU architecture that was first launched in 2012. Kepler is mainly used in GeForce, Quadro and Tesla series graphics processing units (GPUs). The following are some of the key features of the Kepler architecture:

  • GPU Boost:Kepler introduces GPU Boost technology that dynamically increases the clock frequency of the GPU to provide additional computing power when more performance is needed.

  • Dynamic Parallelism:The Kepler architecture supports Dynamic Parallelism, which means that a CUDA core within the GPU can start a new CUDA thread block, making the GPU more flexible when processing parallel computing tasks .

  • Hyper-Q:Kepler introduces Hyper-Q technology, which allows multiple CPU cores to send tasks to the GPU at the same time, improving parallelism between the CPU and GPU.

  • SMX architecture:Kepler adopts a Streaming Multiprocessor (SMX) structure. Each SMX is more powerful than the previous multiprocessor, with higher performance and energy efficiency.

  • Supports more CUDA cores:The Kepler architecture supports more CUDA cores, improving parallel computing performance and helping to process parallel tasks more efficiently.

  • More powerful double-precision performance:For tasks such as scientific computing that require double-precision floating-point calculations, the Kepler architecture provides more powerful double-precision performance.

  • PCI Express 3.0 support:The Kepler architecture supports the PCI Express 3.0 standard, providing higher bandwidth and helping to increase data transfer speeds between the GPU and the host system.

  • TurboHQ:Kepler introduces TurboHQ technology to provide more efficient performance through dynamic clock adjustment at the hardware level.

Some representative NVIDIA graphics cards use the Kepler architecture, including the GeForce 600/700 series and Tesla K10/K20 series. While improving graphics performance, the Kepler architecture also pays more attention to supporting parallel computing tasks, laying the foundation for subsequent architectures.

Maxwell architecture:

NVIDIA's Maxwell architecture is a generation of GPU architecture that was first launched in 2014. Maxwell is mainly used in GeForce, Quadro and Tesla series graphics processing units (GPUs). The following are some of the key features of the Maxwell architecture:

  • Multi-streaming processor (SM) improvements:Maxwell has introduced a new SM design called SMM (Streaming Multiprocessor Maxwell). SMM is more flexible than the previous SMX and can handle parallel computing tasks more efficiently.

  • Dynamic energy efficiency:The Maxwell architecture focuses on energy efficiency and introduces dynamic energy efficiency technology to reduce power consumption while maintaining performance by dynamically adjusting voltage and clock frequency according to the workload.

  • Maxwell GPU Boost 2.0:Introduces GPU Boost 2.0 technology, which intelligently adjusts the GPU clock frequency to allow the GPU to provide higher clock speeds when more performance is needed.

  • Support Unified Memory:Maxwell architecture begins to support Unified Memory, which simplifies memory management between GPU and CPU and improves the efficiency of data transmission.

  • NVIDIA VXGI (Voxel Global Illumination) technology:Maxwell has introduced VXGI technology for real-time lighting and global illumination effects, improving the realism of game graphics rendering.

  • NVIDIA MFAA (Multi-Frame Anti-Aliasing) technology:Introduces MFAA technology to improve anti-aliasing effects and provide better graphics quality.

  • Support H.265 hardware decoding:Maxwell architecture begins to support H.265 hardware decoding, providing more efficient video decoding performance.

  • NVIDIA GameWorks technology:Introduces NVIDIA GameWorks technology to provide enhancements to game graphics effects, including physical rendering, simulation, and particle effects.

Some representative NVIDIA graphics cards use the Maxwell architecture, including the GeForce 900 series and Tesla M40/M60 series. While improving graphics performance, the Maxwell architecture also focuses on improving energy efficiency and graphics effects, laying the foundation for subsequent architectures.

Pascal architecture:

NVIDIA's Pascal architecture is a generation of GPU architecture that was first launched in 2016. Pascal is mainly used in GeForce, Quadro and Tesla series graphics processing units (GPUs). Following are some of the main features of Pascal architecture:

  • 16nm FinFET process:Pascal is NVIDIA's first architecture to use the 16nm FinFET process, which improves performance and efficiency and reduces power consumption.

  • NVIDIA NVLink Technology:Introduces NVLink technology, a high-bandwidth, low-latency interconnect technology used to connect multiple GPUs and provide more efficient inter-GPU communication .

  • New SM (Streaming Multiprocessor) architecture:Pascal has introduced a new SM design, called Pascal SM, which provides higher performance and better performance than the previous architecture efficiency.

  • GDDR5X video memory support:The Pascal architecture introduces GDDR5X video memory support for the first time, which increases the video memory bandwidth and contributes to faster graphics rendering and communication speeds.

  • Simultaneous Multi-Projection (SMP) technology:Introduces SMP technology to project multiple views in a single rendering pass for VR (virtual reality) and multi-monitor applications .

  • NVIDIA Ansel technology:Introduces Ansel technology to provide enhanced functions for game screenshots and virtual reality screenshots, providing higher quality game screenshots.

  • NVIDIA CUDA 8:The Pascal architecture supports CUDA 8, NVIDIA's parallel computing platform, which allows GPUs to be used for general computing tasks.

  • Deep learning performance optimization:The Pascal architecture optimizes deep learning performance at the hardware level and introduces Tensor Cores for efficient execution of deep learning calculations.

Some representative NVIDIA graphics cards use the Pascal architecture, including the GeForce 10 series and Tesla P100 series. While improving graphics performance, the Pascal architecture emphasizes support for emerging application areas such as deep learning and virtual reality.

Volta architecture:

NVIDIA's Volta architecture is a generation of GPU architecture that was first launched in 2017. Volta is mainly used in the Tesla series of high-performance computing graphics processing units (GPUs). Here are some key features of the Volta architecture:

  • Tensor Cores:Volta introduces Tensor Cores, hardware units specifically designed for deep learning tasks. Tensor Cores accelerate matrix multiplication and improve deep learning performance.

  • 64-bit floating point precision and 32-bit floating point performance:The Volta architecture provides powerful 64-bit floating point precision performance in high-performance computing tasks and in deep learning tasks Optimized for 32-bit floating point performance.

  • NVLink 2.0: introduces NVLink 2.0 technology, which provides higher interconnect bandwidth and is suitable for connecting multiple GPUs for high-performance computing.

  • Unified Memory and Page Migration Engine:Volta continues to support Unified Memory, making memory management between CPU and GPU more simplified. Introduced Page Migration Engine for dynamically moving data between GPUs.

  • New SM (Streaming Multiprocessor) architecture:Volta introduces a new SM design, called Volta SM, which provides higher performance and efficiency than the previous architecture.

  • Supports 16-bit floating point calculations:Volta supports 16-bit floating point calculations, providing high performance in some deep learning tasks.

  • NVWMI and GPU Boost 3.0:Introduces NVWMI (NVIDIA Virtual GPU Management Infrastructure) for managing virtual GPUs. GPU Boost 3.0 technology continues to optimize the clock frequency of the GPU.

  • CUDA 9 support:The Volta architecture supports CUDA 9, NVIDIA’s parallel computing platform, which brings new programming capabilities and performance optimizations.

Some representative NVIDIA Tesla graphics cards adopt the Volta architecture, such as the Tesla V100 series. The primary design goal of the Volta architecture is to provide superior high-performance computing and deep learning performance.

Turing architecture:

NVIDIA's Turing architecture is a generation of GPU architecture that was first launched in 2018. Turing is mainly used on GeForce, Quadro and Tesla series graphics processing units (GPUs). The following are some of the key features of the Turing architecture:

  • RT Cores:Turing introduces Ray Tracing Cores (RT Cores) for real-time ray tracing. This allows graphics rendering to achieve more realistic light and shadow effects.

  • Tensor Cores:Same as the Volta architecture, Turing continues to support Tensor Cores for hardware acceleration of deep learning tasks.

  • Improvements in SM architecture:Turing introduces a new SM (Streaming Multiprocessor) design, which provides higher performance and efficiency, and supports parallel integer and floating point calculations.

  • GDDR6 video memory support:Turing architecture introduces GDDR6 video memory support for the first time, which improves video memory bandwidth and contributes to faster graphics rendering and communication speeds.

  • NVIDIA NVLink technology:Turing continues to support NVLink technology for efficient connection of multiple GPUs for high-performance computing.

  • Unified Memory and NVLink Bridge:The Turing architecture further improves Unified Memory and introduces NVLink Bridge to improve the efficiency of data transfer between GPUs.

  • Variable Rate Shading (VRS): introduces VRS technology, allowing game developers to apply different shading rates to different areas, improving performance without affecting graphics quality.

  • NVIDIA NGX technology:Turing introduces NGX technology, including DLSS (Deep Learning Super Sampling) and AI enhanced graphics effects, to provide higher quality game graphics.

Some representative NVIDIA graphics cards use the Turing architecture, including the GeForce 20 series and Quadro RTX series. The Turing architecture further optimizes graphics rendering and game performance while introducing ray tracing and deep learning technologies.

Ampere architecture:

The Ampere architecture is a GPU architecture released by NVIDIA in 2020. Ampere architecture GPUs are targeted at scenarios such as AI, data analysis, and HPC, and can achieve excellent acceleration effects at various scales. The following are the main features of the Ampere architecture:

  • Tensor Cores:The new SM adopts the third generation Tensor Core, which improves the data operation speed, supports more data types for direct operations, and adds fine-grained structured sparse operations.
  • PCIe4: The host-graphics card uses PCIe4 and supports virtualization (SR-IOV);
  • NVlink: The third-generation NVlink is used for communication between graphics cards, with a bandwidth of 600GB/s and 12 channels, which is twice as fast as the previous generation; the communication speed within the card is increased, and the bandwidth of HBM2 is the same. An increase of 0.73 compared with V100; supports asynchronous copy operations, and global memory data can be directly accessed to shared memory through L2.
  • Video memory support: The global video memory specification is increased to 40GB / 80GB, the L2 storage specification is 40MB, and the shared storage can be configured to 164KB/SM.
  • MIG feature: Launched MIG feature, supports instance division, and supports the creation of 7 sub-GPU instances.
  • Improved error handling method: Local processing replaces the whole card restart method, and asynchronous barrier operations are added.

Hopper architecture: 

The Ampere architecture is a GPU architecture released by NVIDIA in 2022. The Hopper architecture is a very powerful and efficient GPU architecture suitable for various scenarios requiring high-performance computing, especially in fields such as AI, data analysis, and HPC.

  • New Turing core: The Hopper architecture adopts a new Turing core, which can provide up to 8192 CUDA cores, which is a significant improvement compared to the 3584 CUDA cores of the previous generation GPU. performance. In addition, the Hopper architecture also uses advanced process technology and energy-saving technology to reduce power consumption with the same performance.
  • Grace CPU-GPU hybrid architecture: The Hopper architecture adopts the new Grace CPU-GPU hybrid architecture, which can achieve efficient data transmission and computing resource utilization
  • ARM Neoverse N2 architecture: Grace CPU adopts ARM Neoverse N2 architecture, which can achieve high-performance computing and memory bandwidth.
  • RTX Tensor Core architecture: The Hopper architecture Lovelace GPU adopts the new RTX Tensor Core architecture, which can achieve higher tensor computing performance and better deep learning performance.
  • GDDR6X memory technology: The Hopper architecture Lovelace GPU adopts the new GDDR6X memory technology, which can achieve higher memory bandwidth and lower latency, and also adopts a new The light tracing accelerator can achieve more efficient light tracing rendering and more realistic light and shadow effects.
  • Supports multiple precision calculations: The Hopper architecture supports multiple precision calculations, including FP32, FP64, FP16 and INT8, etc., which can meet the needs of different application scenarios.
  • Support dynamic precision switching: The Hopper architecture also supports dynamic precision switching, which can adjust the calculation accuracy in real time according to the calculation needs, further improving the energy efficiency ratio.
  • NVLink: The Hopper architecture adopts NVLink high-speed interconnection technology, which can realize high-speed communication between multiple GPUs, thereby building a large-scale GPU cluster to meet various high-performance computing needs. need.
  • Optimization of AI application scenarios: The Hopper architecture is specially optimized for AI application scenarios, including supporting large-scale model training, inference, and deployment. In addition, the Hopper architecture supports various deep learning frameworks and tools and can be easily integrated into the existing AI ecosystem.

2.AMD GPU architecture:

TeraScale architecture:

The TeraScale architecture is one of AMD's (formerly ATI's) GPU architectures and was used in early Radeon graphics cards. The following are some of the key features of the TeraScale architecture:

  • Unified shader architecture:TeraScale introduces a unified shader architecture, which means that it uses programmable shader units, including vertex shaders, pixel shaders, etc. This architecture increases flexibility and programmability.

  • Full pipeline rendering: The architecture adopts full pipeline rendering, including vertex processing, geometry processing, and pixel processing, to support high-performance graphics rendering.

  • Multi-core design:The TeraScale architecture adopts a multi-core design, with each core containing a set of shader units to increase parallel processing capabilities.

  • GDDR5 video memory support:With the evolution of architecture, TeraScale begins to support GDDR5 video memory, providing higher video memory bandwidth and helping to improve graphics rendering performance.

  • HD 5000 Series Introduced:Part of the TeraScale architecture is the HD 5000 Series, which includes graphics cards such as the Radeon HD 5870, which at launch achieved breakthroughs in performance and graphics Remarkable progress.

  • DirectX 11 Compatible:As the TeraScale architecture evolves, support for Microsoft’s DirectX 11 API enables these graphics cards to perform even better in games and applications that support DirectX 11. Advanced graphics effects.

The TeraScale architecture was an important attempt by AMD in early GPU design, laying the foundation for the development of subsequent architectures. However, as technology continues to evolve, AMD gradually transitions to subsequent architectures, such as GCN (Graphics Core Next).

GCN Architecture (Graphics Core Next):

Graphics Core Next (GCN) architecture is one of the GPU architectures launched by AMD and is used in Radeon series graphics cards. The following are some of the main features of the GCN architecture:

  • Unified shader architecture:Similar to the TeraScale architecture, the GCN architecture continues to use a unified shader architecture, making shader units programmable, including vertex shaders, pixel shaders, etc.

  • Heterogeneous computing:GCN architecture emphasizes heterogeneous computing and supports the use of graphics cards for general computing tasks. This makes AMD graphics cards competitive in the field of GPU computing.

  • Superscalar architecture:GCN adopts a superscalar architecture, in which each computing unit can execute multiple instructions, improving parallel processing capabilities.

  • Multi-core design:The GCN architecture introduces multiple computing cores, each of which contains a set of shader units and stream processors. These cores work in parallel to improve overall performance.

  • Asynchronous computing:GCN introduces an asynchronous computing engine, allowing the graphics card to handle multiple computing tasks at the same time, improving concurrency and performance.

  • Heterogeneous System Architecture (HSA):GCN supports HSA, a heterogeneous system architecture that allows GPUs, CPUs, and other accelerators to work more closely together.

  • Mantle API:Mantle is a graphics API developed by AMD that works closely with the GCN architecture to provide lower-level hardware access to optimize game performance.

  • Support DirectX 12 and Vulkan:The GCN architecture supports the latest graphics APIs, such as Microsoft's DirectX 12 and Khronos Group's Vulkan, providing more efficient graphics rendering.

The GCN architecture is widely used in AMD graphics cards, including Radeon HD 7000, R9, R7, RX series, etc. With the continuous advancement of technology, AMD later launched the RDNA (Radeon DNA) architecture as an evolution of GCN to further improve graphics performance and energy efficiency.

RDNA architecture:

Radeon DNA (RDNA) architecture is a GPU architecture launched by AMD as an evolution of the previous generation GCN (Graphics Core Next) architecture. RDNA first debuted in 2019 and is used in AMD Radeon RX 5000 series graphics cards. Here are some key features of RDNA architecture:

  • New computing unit design:RDNA has introduced a new computing unit design called Compute Unit (CU). This design is designed to improve performance and energy efficiency and support more parallel computing.

  • Separation of graphics core and computing core:RDNA separates the graphics core and computing core so that graphics tasks and computing tasks can be executed in parallel more efficiently.

  • GDDR6 video memory support:Similar to the previous architecture, RDNA continues to support GDDR6 video memory, providing higher video memory bandwidth and contributing to faster graphics rendering and communication speeds.

  • Multilevel Cache Hierarchy:RDNA introduces a multi-level cache hierarchy, including L0, L1 and L2 cache. This hierarchy is designed to improve memory access efficiency.

  • Radeon Image Sharpening (RIS):The RDNA architecture introduces RIS technology for real-time image enhancement, providing clearer game graphics.

  • FidelityFX:RDNA supports FidelityFX technology, an open source graphics effects toolkit for game developers to optimize graphics effects.

  • Radeon Anti-Lag:Introduces Radeon Anti-Lag technology to reduce input lag and improve game responsiveness.

  • DirectX 12 and Vulkan support:The RDNA architecture continues to support the latest graphics APIs, such as Microsoft’s DirectX 12 and Khronos Group’s Vulkan, to provide more efficient graphics rendering.

The launch of the RDNA architecture is designed to provide AMD graphics cards with more advanced graphics performance and new graphics features to meet increasingly complex gaming and computing needs.

CDNA architecture:

Compute DNA (CDNA) architecture is AMD's GPU architecture designed for high-performance computing. This architecture debuted in the AMD Instinct MI100 accelerator card, which is focused on data centers and scientific computing. The following are some of the key features of the CDNA architecture:

  • Matrix Core Technology:CDNA introduces Matrix Core technology, a type of hardware designed specifically for deep learning tasks. Matrix Core accelerates the training and inference of deep neural networks (DNN) by providing high-performance 16-bit floating point operations.

  • Infinity Fabric technology:Infinity Fabric is AMD’s technology for connecting multiple computing devices, supporting high-performance communication and collaborative work. In the CDNA architecture, Infinity Fabric is used to connect GPU cores and other processing units to achieve efficient data exchange.

  • Support PCI Express 4.0:The CDNA architecture supports the PCI Express 4.0 standard, providing higher data transmission bandwidth and helping to increase the communication speed with the host system.

  • GPU Infinity Architecture:CDNA adopts GPU Infinity Architecture, which is designed to provide higher performance and energy efficiency to support large-scale scientific computing and deep learning tasks.

  • Cache Hierarchy optimization:CDNA performs advanced cache hierarchy optimization to improve memory access efficiency and facilitate large-scale parallel computing tasks.

  • Support HBM2 (High Bandwidth Memory 2):Similar to the previous GPU architecture, CDNA supports HBM2 video memory, providing greater memory bandwidth and suitable for large-scale computing tasks.

  • Separation of CDNA and RDNA:In AMD architecture, CDNA is used for high-performance computing, while RDNA is used for graphics tasks. This separation is intended to better meet the needs of the two in different areas.

The introduction of the CDNA architecture enables AMD to provide diverse GPU solutions suitable for different fields, while focusing on the needs of areas such as high-performance computing and deep learning.

The continuous evolution of these GPU architectures aims to improve performance and efficiency in aspects such as graphics rendering, parallel computing and deep learning. Different architectures have different characteristics and applicable scenarios, and the choice depends on specific application requirements.

RDNA 3 architecture

AMD released the Radeon RX 7900 XTX and Radeon RX 7900 XT graphics cards based on the new generation RDNA 3 architecture onNovember 3, 2022 , the following are some features of the RDNA 3 architecture:

  • Clock frequency and energy efficiency ratio:The RDNA 3 architecture adopts advanced process and design technology to improve the clock frequency and energy efficiency ratio, and can provide excellent graphics performance and computing power.
  • Infinity Cache: RDNA 3 introduces the concept of Infinity Cache, which is a new cache type that can significantly increase GPU memory. bandwidth, thereby improving GPU performance.
  • Support DP2.1 interface: RDNA 3 architecture supports DP2.1 interface, which can provide higher data transmission rate, thus improving display performance.
  • Dual-launch 64-bit SIMD: The computing unit of the RDNA 3 architecture has also been upgraded, using a new dual-launch 64-bit SIMD, which can provide higher computing efficiency.
  • Supports ray tracing technology: The RDNA 3 architecture supports ray tracing technology, which can provide more realistic game graphics.
  • Supports multiple APIs: RDNA 3 architecture supports multiple APIs, including DirectX, Vulkan, etc., and can be easily integrated into existing games and applications.

RDNA 3 architecture is the latest GPU architecture launched by AMD. It has excellent performance and energy efficiency ratio and is suitable for various high-performance computing and graphics rendering applications.​ 

Guess you like

Origin blog.csdn.net/Alaskan_Husky/article/details/134884751