Référence du site officiel :

Fournisseur d'exécution CUDA

Le fournisseur d'exécution CUDA permet un calcul accéléré par le matériel sur les GPU compatibles Nvidia CUDA.

Contenu

Installer

Des binaires prédéfinis d'ONNX Runtime avec CUDA EP sont publiés pour la plupart des liaisons de langage. Veuillez faire référence à Install ORT .

Exigences

Veuillez vous référer au tableau ci-dessous pour les dépendances officielles des packages GPU pour le package d'inférence ONNX Runtime. Notez que ONNX Runtime Training est aligné sur les versions de PyTorch CUDA ; reportez-vous à l'onglet Formation sur onnxruntime.ai pour connaître les versions prises en charge.

Remarque : en raison de la compatibilité des versions mineures de CUDA, Onnx Runtime construit avec CUDA 11.4 devrait être compatible avec n'importe quelle version de CUDA 11.x. Veuillez faire référence à la compatibilité des versions mineures de Nvidia CUDA .

Exécution ONNX	CUDA	cuDNN	Remarques
1.13	11.6	8.2.4 (Linux) 8.5.0.96 (Windows)	libcudart 11.4.43 libcufft 10.5.2.100 libcurand 10.2.5.120 libcublasLt 11.6.5.2 libcublas 11.6.5.2 libcudnn 8.2.4
1.12 1.11	11.4	8.2.4 (Linux) 8.2.2.26 (Windows)	libcudart 11.4.43 libcufft 10.5.2.100 libcurand 10.2.5.120 libcublasLt 11.6.5.2 libcublas 11.6.5.2 libcudnn 8.2.4
1.10	11.4	8.2.4 (Linux) 8.2.2.26 (Windows)	libcudart 11.4.43 libcufft 10.5.2.100 libcurand 10.2.5.120 libcublasLt 11.6.1.51 libcublas 11.6.1.51 libcudnn 8.2.4
1.9	11.4	8.2.4 (Linux) 8.2.2.26 (Windows)	libcudart 11.4.43 libcufft 10.5.2.100 libcurand 10.2.5.120 libcublasLt 11.6.1.51 libcublas 11.6.1.51 libcudnn 8.2.4
1.8	11.0.3	8.0.4 (Linux) 8.0.2.39 (Windows)	libcudart 11.0.221 libcufft 10.2.1.245 libcurand 10.2.1.245 libcublasLt 11.2.0.252 libcublas 11.2.0.252 libcudnn 8.0.4
1.7	11.0.3	8.0.4 (Linux) 8.0.2.39 (Windows)	libcudart 11.0.221 libcufft 10.2.1.245 libcurand 10.2.1.245 libcublasLt 11.2.0.252 libcublas 11.2.0.252 libcudnn 8.0.4
1,5-1,6	10.2	8.0.3	CUDA 11 peut être construit à partir des sources
1.2-1.4	10.1	7.6.5	Nécessite cublas10-10.2.1.243 ; Cubas 10.1.x ne fonctionnera pas
1.0-1.1	10,0	7.6.4	Les versions CUDA de 9.1 à 10.1 et les versions cuDNN de 7.1 à 7.4 devraient également fonctionner avec Visual Studio 2017

Pour les anciennes versions, veuillez consulter les pages Lisez-moi et build sur la branche release.

Construire

Pour les instructions de construction, veuillez consulter la page CONSTRUCTION .

Options de configuration

Le fournisseur d'exécution CUDA prend en charge les options de configuration suivantes.

Reference de l'appareil

L’identifiant de l’appareil.

Valeur par défaut : 0

gpu_mem_limit

The size limit of the device memory arena in bytes. This size limit is only for the execution provider’s arena. The total device memory usage may be higher. s: max value of C++ size_t type (effectively unlimited)

arena_extend_strategy

The strategy for extending the device memory arena.

Value	Description
kNextPowerOfTwo (0)	subsequent extensions extend by larger amounts (multiplied by powers of two)
kSameAsRequested (1)	extend by the requested amount

Default value: kNextPowerOfTwo

cudnn_conv_algo_search

The type of search done for cuDNN convolution algorithms.

Value	Description
EXHAUSTIVE (0)	expensive exhaustive benchmarking using cudnnFindConvolutionForwardAlgorithmEx
HEURISTIC (1)	lightweight heuristic based search using cudnnGetConvolutionForwardAlgorithm_v7
DEFAULT (2)	default algorithm using CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_PRECOMP_GEMM

Default value: EXHAUSTIVE

do_copy_in_default_stream

Whether to do copies in the default stream or use separate streams. The recommended setting is true. If false, there are race conditions and possibly better performance.

Default value: true

cudnn_conv_use_max_workspace

Check tuning performance for convolution heavy models for details on what this flag does. This flag is only supported from the V2 version of the provider options struct when used using the C API. The V2 provider options struct can be created using this and updated using this. Please take a look at the sample below for an example.

Default value: 0

cudnn_conv1d_pad_to_nc1d

Check convolution input padding in the CUDA EP for details on what this flag does. This flag is only supported from the V2 version of the provider options struct when used using the C API. The V2 provider options struct can be created using this and updated using this. Please take a look at the sample below for an example.

Default value: 0

enable_cuda_graph

Check using CUDA Graphs in the CUDA EP for details on what this flag does. This flag is only supported from the V2 version of the provider options struct when used using the C API. The V2 provider options struct can be created using this and updated using this.

Default value: 0

Samples

Python

import onnxruntime as ort

model_path = '<path to model>'

providers = [
    ('CUDAExecutionProvider', {
        'device_id': 0,
        'arena_extend_strategy': 'kNextPowerOfTwo',
        'gpu_mem_limit': 2 * 1024 * 1024 * 1024,
        'cudnn_conv_algo_search': 'EXHAUSTIVE',
        'do_copy_in_default_stream': True,
    }),
    'CPUExecutionProvider',
]

session = ort.InferenceSession(model_path, providers=providers)

C/C++

USING LEGACY PROVIDER OPTIONS STRUCT

OrtSessionOptions* session_options = /* ... */;

OrtCUDAProviderOptions options;
options.device_id = 0;
options.arena_extend_strategy = 0;
options.gpu_mem_limit = 2 * 1024 * 1024 * 1024;
options.cudnn_conv_algo_search = OrtCudnnConvAlgoSearchExhaustive;
options.do_copy_in_default_stream = 1;

SessionOptionsAppendExecutionProvider_CUDA(session_options, &options);

USING V2 PROVIDER OPTIONS STRUCT

OrtCUDAProviderOptionsV2* cuda_options = nullptr;
CreateCUDAProviderOptions(&cuda_options);

std::vector<const char*> keys{"device_id", "gpu_mem_limit", "arena_extend_strategy", "cudnn_conv_algo_search", "do_copy_in_default_stream", "cudnn_conv_use_max_workspace", "cudnn_conv1d_pad_to_nc1d"};
std::vector<const char*> values{"0", "2147483648", "kSameAsRequested", "DEFAULT", "1", "1", "1"};

UpdateCUDAProviderOptions(cuda_options, keys.data(), values.data(), keys.size());

OrtSessionOptions* session_options = /* ... */;
SessionOptionsAppendExecutionProvider_CUDA_V2(session_options, cuda_options);

// Finally, don't forget to release the provider options
ReleaseCUDAProviderOptions(cuda_options);

C#

var cudaProviderOptions = new OrtCUDAProviderOptions(); // Dispose this finally

var providerOptionsDict = new Dictionary<string, string>();
providerOptionsDict["device_id"] = "0";
providerOptionsDict["gpu_mem_limit"] = "2147483648";
providerOptionsDict["arena_extend_strategy"] = "kSameAsRequested";
providerOptionsDict["cudnn_conv_algo_search"] = "DEFAULT";
providerOptionsDict["do_copy_in_default_stream"] = "1";
providerOptionsDict["cudnn_conv_use_max_workspace"] = "1";
providerOptionsDict["cudnn_conv1d_pad_to_nc1d"] = "1";

cudaProviderOptions.UpdateOptions(providerOptionsDict);

SessionOptions options = SessionOptions.MakeSessionOptionWithCudaProvider(cudaProviderOptions);  // Dispose this finally

[onnxrumtime] table de correspondance onnxruntime et cuda

Fournisseur d'exécution CUDA

Contenu

Installer

Exigences

Construire

Options de configuration

Reference de l'appareil

gpu_mem_limit

arena_extend_strategy

cudnn_conv_algo_search

do_copy_in_default_stream

cudnn_conv_use_max_workspace

cudnn_conv1d_pad_to_nc1d

enable_cuda_graph

Samples

Python

C/C++

C#

Je suppose que tu aimes