1、测试模型

下面提示说Tensorrt需要的cudnn版本8.3.2，本地的cudnn版本8.2.1，不知道有什么影响。

 .\trtexec.exe --onnx=AnkleSeg.onnx
&&&& RUNNING TensorRT.trtexec [TensorRT v8400] # D:\ProgramFiles\TensorRT-8.4.0.6.Windows10.x86_64.cuda-11.6.cudnn8.3\TensorRT-8.4.0.6\bin\trtexec.exe --onnx=AnkleSeg.onnx
[06/17/2022-10:25:48] [I] === Model Options ===
[06/17/2022-10:25:48] [I] Format: ONNX
[06/17/2022-10:25:48] [I] Model: AnkleSeg.onnx
[06/17/2022-10:25:48] [I] Output:
[06/17/2022-10:25:48] [I] === Build Options ===
[06/17/2022-10:25:48] [I] Max batch: explicit batch
[06/17/2022-10:25:48] [I] Memory Pools: workspace: default, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default
[06/17/2022-10:25:48] [I] minTiming: 1
[06/17/2022-10:25:48] [I] avgTiming: 8
[06/17/2022-10:25:48] [I] Precision: FP32
[06/17/2022-10:25:48] [I] LayerPrecisions:
[06/17/2022-10:25:48] [I] Calibration:
[06/17/2022-10:25:48] [I] Refit: Disabled
[06/17/2022-10:25:48] [I] Sparsity: Disabled
[06/17/2022-10:25:48] [I] Safe mode: Disabled
[06/17/2022-10:25:48] [I] DirectIO mode: Disabled
[06/17/2022-10:25:48] [I] Restricted mode: Disabled
[06/17/2022-10:25:48] [I] Save engine:
[06/17/2022-10:25:48] [I] Load engine:
[06/17/2022-10:25:48] [I] Profiling verbosity: 0
[06/17/2022-10:25:48] [I] Tactic sources: Using default tactic sources
[06/17/2022-10:25:48] [I] timingCacheMode: local
[06/17/2022-10:25:48] [I] timingCacheFile:
[06/17/2022-10:25:48] [I] Input(s)s format: fp32:CHW
[06/17/2022-10:25:48] [I] Output(s)s format: fp32:CHW
[06/17/2022-10:25:48] [I] Input build shapes: model
[06/17/2022-10:25:48] [I] Input calibration shapes: model
[06/17/2022-10:25:48] [I] === System Options ===
[06/17/2022-10:25:48] [I] Device: 0
[06/17/2022-10:25:48] [I] DLACore:
[06/17/2022-10:25:48] [I] Plugins:
[06/17/2022-10:25:48] [I] === Inference Options ===
[06/17/2022-10:25:48] [I] Batch: Explicit
[06/17/2022-10:25:48] [I] Input inference shapes: model
[06/17/2022-10:25:48] [I] Iterations: 10
[06/17/2022-10:25:48] [I] Duration: 3s (+ 200ms warm up)
[06/17/2022-10:25:48] [I] Sleep time: 0ms
[06/17/2022-10:25:48] [I] Idle time: 0ms
[06/17/2022-10:25:48] [I] Streams: 1
[06/17/2022-10:25:48] [I] ExposeDMA: Disabled
[06/17/2022-10:25:48] [I] Data transfers: Enabled
[06/17/2022-10:25:48] [I] Spin-wait: Disabled
[06/17/2022-10:25:48] [I] Multithreading: Disabled
[06/17/2022-10:25:48] [I] CUDA Graph: Disabled
[06/17/2022-10:25:48] [I] Separate profiling: Disabled
[06/17/2022-10:25:48] [I] Time Deserialize: Disabled
[06/17/2022-10:25:48] [I] Time Refit: Disabled
[06/17/2022-10:25:48] [I] Skip inference: Disabled
[06/17/2022-10:25:48] [I] Inputs:
[06/17/2022-10:25:48] [I] === Reporting Options ===
[06/17/2022-10:25:48] [I] Verbose: Disabled
[06/17/2022-10:25:48] [I] Averages: 10 inferences
[06/17/2022-10:25:48] [I] Percentile: 99
[06/17/2022-10:25:48] [I] Dump refittable layers:Disabled
[06/17/2022-10:25:48] [I] Dump output: Disabled
[06/17/2022-10:25:48] [I] Profile: Disabled
[06/17/2022-10:25:48] [I] Export timing to JSON file:
[06/17/2022-10:25:48] [I] Export output to JSON file:
[06/17/2022-10:25:48] [I] Export profile to JSON file:
[06/17/2022-10:25:48] [I]
[06/17/2022-10:25:48] [I] === Device Information ===
[06/17/2022-10:25:48] [I] Selected Device: GeForce GTX 1650 Ti
[06/17/2022-10:25:48] [I] Compute Capability: 7.5
[06/17/2022-10:25:48] [I] SMs: 16
[06/17/2022-10:25:48] [I] Compute Clock Rate: 1.485 GHz
[06/17/2022-10:25:48] [I] Device Global Memory: 4096 MiB
[06/17/2022-10:25:48] [I] Shared Memory per SM: 64 KiB
[06/17/2022-10:25:48] [I] Memory Bus Width: 128 bits (ECC disabled)
[06/17/2022-10:25:48] [I] Memory Clock Rate: 6.001 GHz
[06/17/2022-10:25:48] [I]
[06/17/2022-10:25:48] [I] TensorRT version: 8.4.0
[06/17/2022-10:25:48] [I] [TRT] [MemUsageChange] Init CUDA: CPU +407, GPU +0, now: CPU 8408, GPU 905 (MiB)
[06/17/2022-10:25:49] [I] [TRT] [MemUsageSnapshot] Begin constructing builder kernel library: CPU 8615 MiB, GPU 905 MiB
[06/17/2022-10:25:49] [I] [TRT] [MemUsageSnapshot] End constructing builder kernel library: CPU 8849 MiB, GPU 979 MiB
[06/17/2022-10:25:49] [I] Start parsing network model
[06/17/2022-10:25:49] [I] [TRT] ----------------------------------------------------------------
[06/17/2022-10:25:49] [I] [TRT] Input filename:   AnkleSeg.onnx
[06/17/2022-10:25:49] [I] [TRT] ONNX IR version:  0.0.7
[06/17/2022-10:25:49] [I] [TRT] Opset version:    12
[06/17/2022-10:25:49] [I] [TRT] Producer name:    pytorch
[06/17/2022-10:25:49] [I] [TRT] Producer version: 1.10
[06/17/2022-10:25:49] [I] [TRT] Domain:
[06/17/2022-10:25:49] [I] [TRT] Model version:    0
[06/17/2022-10:25:49] [I] [TRT] Doc string:
[06/17/2022-10:25:49] [I] [TRT] ----------------------------------------------------------------
[06/17/2022-10:25:49] [I] Finish parsing network model
[06/17/2022-10:25:50] [W] [TRT] TensorRT was linked against cuBLAS/cuBLAS LT 11.8.0 but loaded cuBLAS/cuBLAS LT 11.2.1
[06/17/2022-10:25:50] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +440, GPU +160, now: CPU 9158, GPU 1139 (MiB)
[06/17/2022-10:25:51] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +488, GPU +170, now: CPU 9646, GPU 1309 (MiB)
[06/17/2022-10:25:51] [W] [TRT] TensorRT was linked against cuDNN 8.3.2 but loaded cuDNN 8.2.1
[06/17/2022-10:25:51] [I] [TRT] Local timing cache in use. Profiling results in this builder pass will not be stored.
[06/17/2022-10:31:02] [I] [TRT] Detected 1 inputs and 1 output network tensors.
[06/17/2022-10:31:02] [I] [TRT] Total Host Persistent Memory: 33504
[06/17/2022-10:31:02] [I] [TRT] Total Device Persistent Memory: 0
[06/17/2022-10:31:02] [I] [TRT] Total Scratch Memory: 306584064
[06/17/2022-10:31:02] [I] [TRT] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 6 MiB, GPU 1841 MiB
[06/17/2022-10:31:02] [I] [TRT] [BlockAssignment] Algorithm ShiftNTopDown took 0.9192ms to assign 5 blocks to 32 nodes requiring 1694498816 bytes.
[06/17/2022-10:31:02] [I] [TRT] Total Activation Memory: 1694498816
[06/17/2022-10:31:02] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +10, now: CPU 10056, GPU 1505 (MiB)
[06/17/2022-10:31:02] [W] [TRT] TensorRT was linked against cuDNN 8.3.2 but loaded cuDNN 8.2.1
[06/17/2022-10:31:02] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in building engine: CPU +0, GPU +25, now: CPU 0, GPU 25 (MiB)
[06/17/2022-10:31:02] [I] [TRT] [MemUsageChange] Init CUDA: CPU +0, GPU +0, now: CPU 10105, GPU 1451 (MiB)
[06/17/2022-10:31:02] [I] [TRT] Loaded engine size: 25 MiB
[06/17/2022-10:31:02] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +1, GPU +10, now: CPU 10105, GPU 1488 (MiB)
[06/17/2022-10:31:02] [W] [TRT] TensorRT was linked against cuDNN 8.3.2 but loaded cuDNN 8.2.1
[06/17/2022-10:31:02] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +24, now: CPU 0, GPU 24 (MiB)
[06/17/2022-10:31:02] [I] Engine built in 314.206 sec.
[06/17/2022-10:31:02] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +10, now: CPU 9824, GPU 1418 (MiB)
[06/17/2022-10:31:02] [W] [TRT] TensorRT was linked against cuDNN 8.3.2 but loaded cuDNN 8.2.1
[06/17/2022-10:31:02] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +1616, now: CPU 0, GPU 1640 (MiB)
[06/17/2022-10:31:02] [I] Using random values for input input
[06/17/2022-10:31:02] [I] Created input binding for input with dimensions 1x1x128x128x128
[06/17/2022-10:31:02] [I] Using random values for output output
[06/17/2022-10:31:02] [I] Created output binding for output with dimensions 1x1x128x128x128
[06/17/2022-10:31:02] [I] Starting inference
[06/17/2022-10:31:11] [I] Warmup completed 1 queries over 200 ms
[06/17/2022-10:31:11] [I] Timing trace has 10 queries over 9.1949 s
[06/17/2022-10:31:11] [I]
[06/17/2022-10:31:11] [I] === Trace details ===
[06/17/2022-10:31:11] [I] Trace averages of 10 runs:
[06/17/2022-10:31:11] [I] Average on 10 runs - GPU latency: 835.82 ms - Host latency: 838.362 ms (end to end 1671.36 ms, enqueue 1.20157 ms)
[06/17/2022-10:31:11] [I]
[06/17/2022-10:31:11] [I] === Performance summary ===
[06/17/2022-10:31:11] [I] Throughput: 1.08756 qps
[06/17/2022-10:31:11] [I] Latency: min = 836.438 ms, max = 839.76 ms, mean = 838.362 ms, median = 838.264 ms, percentile(99%) = 839.76 ms
[06/17/2022-10:31:11] [I] End-to-End Host Latency: min = 1669.15 ms, max = 1673.3 ms, mean = 1671.36 ms, median = 1671.56 ms, percentile(99%) = 1673.3 ms
[06/17/2022-10:31:11] [I] Enqueue Time: min = 0.5633 ms, max = 1.66211 ms, mean = 1.20157 ms, median = 1.29932 ms, percentile(99%) = 1.66211 ms
[06/17/2022-10:31:11] [I] H2D Latency: min = 1.26025 ms, max = 1.31482 ms, mean = 1.26654 ms, median = 1.26025 ms, percentile(99%) = 1.31482 ms
[06/17/2022-10:31:11] [I] GPU Compute Time: min = 833.902 ms, max = 837.225 ms, mean = 835.82 ms, median = 835.699 ms, percentile(99%) = 837.225 ms
[06/17/2022-10:31:11] [I] D2H Latency: min = 1.27246 ms, max = 1.2793 ms, mean = 1.27515 ms, median = 1.2749 ms, percentile(99%) = 1.2793 ms
[06/17/2022-10:31:11] [I] Total Host Walltime: 9.1949 s
[06/17/2022-10:31:11] [I] Total GPU Compute Time: 8.3582 s
[06/17/2022-10:31:11] [I] Explanations of the performance metrics are printed in the verbose logs.
[06/17/2022-10:31:11] [I]
&&&& PASSED TensorRT.trtexec [TensorRT v8400] # D:\ProgramFiles\TensorRT-8.4.0.6.Windows10.x86_64.cuda-11.6.cudnn8.3\TensorRT-8.4.0.6\bin\trtexec.exe --onnx=AnkleSeg.onnx
PS D:\ProgramFiles\TensorRT-8.4.0.6.Windows10.x86_64.cuda-11.6.cudnn8.3\TensorRT-8.4.0.6\bin>

2、模型转换

 .\trtexec.exe --onnx=AnkleSeg.onnx --saveEngine=AnkleSeg.engine
&&&& RUNNING TensorRT.trtexec [TensorRT v8400] # D:\ProgramFiles\TensorRT-8.4.0.6.Windows10.x86_64.cuda-11.6.cudnn8.3\TensorRT-8.4.0.6\bin\trtexec.exe --onnx=AnkleSeg.onnx --saveEngine=AnkleSeg.engine
[06/17/2022-10:38:23] [I] === Model Options ===
[06/17/2022-10:38:23] [I] Format: ONNX
[06/17/2022-10:38:23] [I] Model: AnkleSeg.onnx
[06/17/2022-10:38:23] [I] Output:
[06/17/2022-10:38:23] [I] === Build Options ===
[06/17/2022-10:38:23] [I] Max batch: explicit batch
[06/17/2022-10:38:23] [I] Memory Pools: workspace: default, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default
[06/17/2022-10:38:23] [I] minTiming: 1
[06/17/2022-10:38:23] [I] avgTiming: 8
[06/17/2022-10:38:23] [I] Precision: FP32
[06/17/2022-10:38:23] [I] LayerPrecisions:
[06/17/2022-10:38:23] [I] Calibration:
[06/17/2022-10:38:23] [I] Refit: Disabled
[06/17/2022-10:38:23] [I] Sparsity: Disabled
[06/17/2022-10:38:23] [I] Safe mode: Disabled
[06/17/2022-10:38:23] [I] DirectIO mode: Disabled
[06/17/2022-10:38:23] [I] Restricted mode: Disabled
[06/17/2022-10:38:23] [I] Save engine: AnkleSeg.engine
[06/17/2022-10:38:23] [I] Load engine:
[06/17/2022-10:38:23] [I] Profiling verbosity: 0
[06/17/2022-10:38:23] [I] Tactic sources: Using default tactic sources
[06/17/2022-10:38:23] [I] timingCacheMode: local
[06/17/2022-10:38:23] [I] timingCacheFile:
[06/17/2022-10:38:23] [I] Input(s)s format: fp32:CHW
[06/17/2022-10:38:23] [I] Output(s)s format: fp32:CHW
[06/17/2022-10:38:23] [I] Input build shapes: model
[06/17/2022-10:38:23] [I] Input calibration shapes: model
[06/17/2022-10:38:23] [I] === System Options ===
[06/17/2022-10:38:23] [I] Device: 0
[06/17/2022-10:38:23] [I] DLACore:
[06/17/2022-10:38:23] [I] Plugins:
[06/17/2022-10:38:23] [I] === Inference Options ===
[06/17/2022-10:38:23] [I] Batch: Explicit
[06/17/2022-10:38:23] [I] Input inference shapes: model
[06/17/2022-10:38:23] [I] Iterations: 10
[06/17/2022-10:38:23] [I] Duration: 3s (+ 200ms warm up)
[06/17/2022-10:38:23] [I] Sleep time: 0ms
[06/17/2022-10:38:23] [I] Idle time: 0ms
[06/17/2022-10:38:23] [I] Streams: 1
[06/17/2022-10:38:23] [I] ExposeDMA: Disabled
[06/17/2022-10:38:23] [I] Data transfers: Enabled
[06/17/2022-10:38:23] [I] Spin-wait: Disabled
[06/17/2022-10:38:23] [I] Multithreading: Disabled
[06/17/2022-10:38:23] [I] CUDA Graph: Disabled
[06/17/2022-10:38:23] [I] Separate profiling: Disabled
[06/17/2022-10:38:23] [I] Time Deserialize: Disabled
[06/17/2022-10:38:23] [I] Time Refit: Disabled
[06/17/2022-10:38:23] [I] Skip inference: Disabled
[06/17/2022-10:38:23] [I] Inputs:
[06/17/2022-10:38:23] [I] === Reporting Options ===
[06/17/2022-10:38:23] [I] Verbose: Disabled
[06/17/2022-10:38:23] [I] Averages: 10 inferences
[06/17/2022-10:38:23] [I] Percentile: 99
[06/17/2022-10:38:23] [I] Dump refittable layers:Disabled
[06/17/2022-10:38:23] [I] Dump output: Disabled
[06/17/2022-10:38:23] [I] Profile: Disabled
[06/17/2022-10:38:23] [I] Export timing to JSON file:
[06/17/2022-10:38:23] [I] Export output to JSON file:
[06/17/2022-10:38:23] [I] Export profile to JSON file:
[06/17/2022-10:38:23] [I]
[06/17/2022-10:38:23] [I] === Device Information ===
[06/17/2022-10:38:23] [I] Selected Device: GeForce GTX 1650 Ti
[06/17/2022-10:38:23] [I] Compute Capability: 7.5
[06/17/2022-10:38:23] [I] SMs: 16
[06/17/2022-10:38:23] [I] Compute Clock Rate: 1.485 GHz
[06/17/2022-10:38:23] [I] Device Global Memory: 4096 MiB
[06/17/2022-10:38:23] [I] Shared Memory per SM: 64 KiB
[06/17/2022-10:38:23] [I] Memory Bus Width: 128 bits (ECC disabled)
[06/17/2022-10:38:23] [I] Memory Clock Rate: 6.001 GHz
[06/17/2022-10:38:23] [I]
[06/17/2022-10:38:23] [I] TensorRT version: 8.4.0
[06/17/2022-10:38:23] [I] [TRT] [MemUsageChange] Init CUDA: CPU +407, GPU +0, now: CPU 8522, GPU 905 (MiB)
[06/17/2022-10:38:24] [I] [TRT] [MemUsageSnapshot] Begin constructing builder kernel library: CPU 8725 MiB, GPU 905 MiB
[06/17/2022-10:38:24] [I] [TRT] [MemUsageSnapshot] End constructing builder kernel library: CPU 8953 MiB, GPU 979 MiB
[06/17/2022-10:38:24] [I] Start parsing network model
[06/17/2022-10:38:24] [I] [TRT] ----------------------------------------------------------------
[06/17/2022-10:38:24] [I] [TRT] Input filename:   AnkleSeg.onnx
[06/17/2022-10:38:24] [I] [TRT] ONNX IR version:  0.0.7
[06/17/2022-10:38:24] [I] [TRT] Opset version:    12
[06/17/2022-10:38:24] [I] [TRT] Producer name:    pytorch
[06/17/2022-10:38:24] [I] [TRT] Producer version: 1.10
[06/17/2022-10:38:24] [I] [TRT] Domain:
[06/17/2022-10:38:24] [I] [TRT] Model version:    0
[06/17/2022-10:38:24] [I] [TRT] Doc string:
[06/17/2022-10:38:24] [I] [TRT] ----------------------------------------------------------------
[06/17/2022-10:38:24] [I] Finish parsing network model
[06/17/2022-10:38:24] [W] [TRT] TensorRT was linked against cuBLAS/cuBLAS LT 11.8.0 but loaded cuBLAS/cuBLAS LT 11.2.1
[06/17/2022-10:38:24] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +453, GPU +158, now: CPU 9282, GPU 1137 (MiB)
[06/17/2022-10:38:25] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +486, GPU +172, now: CPU 9768, GPU 1309 (MiB)
[06/17/2022-10:38:25] [W] [TRT] TensorRT was linked against cuDNN 8.3.2 but loaded cuDNN 8.2.1
[06/17/2022-10:38:25] [I] [TRT] Local timing cache in use. Profiling results in this builder pass will not be stored.
[06/17/2022-10:43:37] [I] [TRT] Detected 1 inputs and 1 output network tensors.
[06/17/2022-10:43:37] [I] [TRT] Total Host Persistent Memory: 33504
[06/17/2022-10:43:37] [I] [TRT] Total Device Persistent Memory: 0
[06/17/2022-10:43:37] [I] [TRT] Total Scratch Memory: 306584064
[06/17/2022-10:43:37] [I] [TRT] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 6 MiB, GPU 1841 MiB
[06/17/2022-10:43:37] [I] [TRT] [BlockAssignment] Algorithm ShiftNTopDown took 1.6881ms to assign 5 blocks to 33 nodes requiring 1694498816 bytes.
[06/17/2022-10:43:37] [I] [TRT] Total Activation Memory: 1694498816
[06/17/2022-10:43:37] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +10, now: CPU 10251, GPU 1505 (MiB)
[06/17/2022-10:43:37] [W] [TRT] TensorRT was linked against cuDNN 8.3.2 but loaded cuDNN 8.2.1
[06/17/2022-10:43:37] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in building engine: CPU +0, GPU +25, now: CPU 0, GPU 25 (MiB)
[06/17/2022-10:43:37] [I] [TRT] [MemUsageChange] Init CUDA: CPU +0, GPU +0, now: CPU 10301, GPU 1451 (MiB)
[06/17/2022-10:43:37] [I] [TRT] Loaded engine size: 25 MiB
[06/17/2022-10:43:37] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +1, GPU +10, now: CPU 10302, GPU 1488 (MiB)
[06/17/2022-10:43:37] [W] [TRT] TensorRT was linked against cuDNN 8.3.2 but loaded cuDNN 8.2.1
[06/17/2022-10:43:37] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +24, now: CPU 0, GPU 24 (MiB)
[06/17/2022-10:43:37] [I] Engine built in 314.716 sec.
[06/17/2022-10:43:37] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +1, GPU +10, now: CPU 10032, GPU 1422 (MiB)
[06/17/2022-10:43:37] [W] [TRT] TensorRT was linked against cuDNN 8.3.2 but loaded cuDNN 8.2.1
[06/17/2022-10:43:37] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +1616, now: CPU 0, GPU 1640 (MiB)
[06/17/2022-10:43:37] [I] Using random values for input input
[06/17/2022-10:43:37] [I] Created input binding for input with dimensions 1x1x128x128x128
[06/17/2022-10:43:37] [I] Using random values for output output
[06/17/2022-10:43:37] [I] Created output binding for output with dimensions 1x1x128x128x128
[06/17/2022-10:43:37] [I] Starting inference
[06/17/2022-10:43:47] [I] Warmup completed 1 queries over 200 ms
[06/17/2022-10:43:47] [I] Timing trace has 10 queries over 9.11786 s
[06/17/2022-10:43:47] [I]
[06/17/2022-10:43:47] [I] === Trace details ===
[06/17/2022-10:43:47] [I] Trace averages of 10 runs:
[06/17/2022-10:43:47] [I] Average on 10 runs - GPU latency: 832.854 ms - Host latency: 835.392 ms (end to end 1623.42 ms, enqueue 1.15587 ms)
[06/17/2022-10:43:47] [I]
[06/17/2022-10:43:47] [I] === Performance summary ===
[06/17/2022-10:43:47] [I] Throughput: 1.09675 qps
[06/17/2022-10:43:47] [I] Latency: min = 832.799 ms, max = 838.636 ms, mean = 835.392 ms, median = 835.233 ms, percentile(99%) = 838.636 ms
[06/17/2022-10:43:47] [I] End-to-End Host Latency: min = 1620.69 ms, max = 1627.18 ms, mean = 1623.42 ms, median = 1622.96 ms, percentile(99%) = 1627.18 ms
[06/17/2022-10:43:47] [I] Enqueue Time: min = 0.6603 ms, max = 1.95264 ms, mean = 1.15587 ms, median = 1.08401 ms, percentile(99%) = 1.95264 ms
[06/17/2022-10:43:47] [I] H2D Latency: min = 1.2627 ms, max = 1.26925 ms, mean = 1.26441 ms, median = 1.26416 ms, percentile(99%) = 1.26925 ms
[06/17/2022-10:43:47] [I] GPU Compute Time: min = 830.261 ms, max = 836.093 ms, mean = 832.854 ms, median = 832.697 ms, percentile(99%) = 836.093 ms
[06/17/2022-10:43:47] [I] D2H Latency: min = 1.27148 ms, max = 1.27454 ms, mean = 1.27365 ms, median = 1.27344 ms, percentile(99%) = 1.27454 ms
[06/17/2022-10:43:47] [I] Total Host Walltime: 9.11786 s
[06/17/2022-10:43:47] [I] Total GPU Compute Time: 8.32854 s
[06/17/2022-10:43:47] [I] Explanations of the performance metrics are printed in the verbose logs.
[06/17/2022-10:43:47] [I]
&&&& PASSED TensorRT.trtexec [TensorRT v8400] # D:\ProgramFiles\TensorRT-8.4.0.6.Windows10.x86_64.cuda-11.6.cudnn8.3\TensorRT-8.4.0.6\bin\trtexec.exe --onnx=AnkleSeg.onnx --saveEngine=AnkleSeg.engine
PS D:\ProgramFiles\TensorRT-8.4.0.6.Windows10.x86_64.cuda-11.6.cudnn8.3\TensorRT-8.4.0.6\bin>

Tensorrt测试onnx模型

1、测试模型

2、模型转换

猜你喜欢

目录

热门文章