1 Introduction
CUDA
CUDA is a general-purpose parallel computing architecture launched by Nvidia in 2006, which aims to solve parallel computing problems on GPUs. Its ease of use and convenience can facilitate developers to program the GPU conveniently, and make full use of the parallel capability of the GPU to greatly improve the performance of the program.
Since the birth of CUDA, the CUDA ecosystem has also developed rapidly, including a large number of software development tools, services and solutions. CUDA Toolkit includes libraries, debugging and optimization tools, compiler and runtime libraries.
ROCm
AMD ROCm is the abbreviation of Radeon Open Compute (platform). It is an open source software development platform developed by AMD in 2015 to benchmark the CUDA ecosystem for HPC and ultra-large-scale GPU computing. ROCm only supports the Linux platform.
Similarly, ROCm includes a series of development tools, software frameworks, libraries, compilation tools, programming models, etc.
The figure below sorts out the comparison between Nvidia's CUDA ecology and AMD's ROCm ecology.
2 Comparison between CUDA ecology and ROCm ecology
2.1 Programming model and API
NVIDIA |
AMD |
Functional description |
CUDA |
HIP |
Provide a comprehensive environment for C/C++ development of GPU accelerated programs, API, Runtime, compiler, debugging tools, etc. |
OpenCL |
OpenCL |
An open and free standard for general-purpose parallel programming of heterogeneous systems, and a unified programming environment that facilitates software developers to write efficient and lightweight code for high-performance computing servers, desktop computing systems, and handheld devices. |
OpenACC |
Parallel Computing Instructions, the GPU parallel programming model most commonly used by researchers and technical programmers. |
|
OpenMP |
OpenMP is a specification of a set of compiler directives, library routines, and environment variables that can be used to specify high-level parallelism in Fortran and C/C++ programs. |
2.2 Compilation and Toolchain
NVIDIA |
AMD |
Functional description |
NVCC |
ROCmCC / HCC |
translater |
CUDA-GDB |
ROCgdb |
debug tool |
HIPify |
Convert CUDA native code to HIP native c++ code |
|
Nvidia Nsight |
ROCm Profiling Tools |
performance analysis tool |
nvidia-smi |
rocm-msi |
Tools for System Administration Interface and Command Line Interface |
2.3 GPU accelerated library
CUDA和ROCm的基础框架提供众多的支持库,包括基础数学库、AI支持库、通信库、并行库等一些列,下面将列出来做个对照:
数学库
NVIDIA |
AMD |
功能描述 |
cuBLAS |
rocBLAS |
基本线性代数库(basic linear algebra,BLAS) |
cuFFT |
rocFFT |
快速傅里叶变换库(Fast Fourier Transforms) |
CUDA Math Library |
标准数学函数库 |
|
cuRAND |
随机数生成(random number generation,RNG) |
|
cuSOLVER |
rocSOLVER |
密集和稀疏直接求解器 |
cuSPARSE |
rocSPARSE / rocALUTION |
稀疏矩阵BLAS |
cuTENSOR |
rocWMMA |
张量线性代数库 |
AmgX |
用于模拟和隐式非结构化方法线性解算器 |
并行算法库
NVIDIA |
AMD |
功能描述 |
Thrust |
Parallel STL / rocThrust |
C++并行算法和数据结构库 |
图像和视频库
NVIDIA |
AMD |
功能描述 |
nvJPEG |
用于JPEG解码的高性能GPU加速库 |
|
Nvidia Performance Primitive |
提供GPU加速的图像、视频和信号处理功能 |
|
Nvidia Video Codec SDK |
硬件加速视频编码和解码的一整套API、示例和文档 |
通信库
NVIDIA |
AMD |
功能描述 |
NVSHMEM |
OpenSHMEM标准的GPU内存,具有扩展以提高GPU性能。 |
|
NCCL |
RCCL |
多GPU、多节点通信 |
深度学习/人工智能库
Nvidia |
AMD |
|
|
2.4 开发工具
Nvidia |
AMD |
|
|