3.1.cuda Runtime API - Overview

foreword

Teacher Du launched the tensorRT high-performance deployment course from scratch . I have read it before, but I didn’t take notes, and I forgot many things. This time I will do it again, and take notes by the way.

This course learns the simplified CUDA tutorial-Runtime API overview

The course outline can be seen in the mind map below

insert image description here

1. Runtime API Overview

For the Runtime API you need to know:

  1. For the runtime API, the biggest difference from the driver is lazy loading
  2. That is, when the first runtime API is called, cuInit will be initialized to avoid the initialization dilemma of the driver API
  3. That is, when the first API call that requires context, context association will be performed, context will be created and current context will be set, and cuDevicePrimaryCtxRetain will be called to implement
  4. Most APIs need context, such as querying the current graphics card name, parameters, memory allocation, release, etc.

The location of the Runtime API is shown in the figure below

insert image description here

Figure 1-1 Runtime API location

For the Runtime API you also need to know:

  1. CUDA Runtime is a high-level and more friendly API that encapsulates CUDA Driver
  2. Use cuDevicePrimaryCtxRetain to set the context for each device, no longer manually manage the context, and do not provide an API to directly manage the context (it can be managed by Driver API, usually not needed)
  3. The kernel function can be executed more friendly, and .cpp can be seamlessly connected with .cu files
  4. Corresponding to cuda_runtime.h and libcudart.so
  5. The runtime api is released with the cuda toolkit
  6. The main knowledge points are the use of kernel functions, warp layout, memory model, and use of streams
  7. Mainly implement inductive summation, affine transformation, matrix multiplication, and model post-processing , which can solve most problems

Summarize

This course is an overview of Runtime API, which is a more advanced package of Driver API, which can automatically manage the creation of context. For Runtime API, we need to know the use of kernel functions, thread bundles, memory models and streams. In terms of cases, it is necessary to master affine transformation, model preprocessing, and model postprocessing.
Memory model and use of streams. In terms of cases, it is necessary to master affine transformation, model preprocessing, and model postprocessing.

Guess you like

Origin blog.csdn.net/qq_40672115/article/details/131606143