Triton_server deployment study notes

Download mirror
docker pill http://nvcr.io/nvidia/tritonserver:22.07-py3

docker run --gpus all -itd -p8000:8000 -p8001:8001 -p8002:8002 -v /home/ai-developer/server/docs/examples/model_repository/:/models nvcr.io/nvidia/tritonserver:22.07-py3

docker exec -it a5bc bash

tritonserver --model-repository=/models --strict-model-config=false

Please add a picture description
Please add a picture description
Please add a picture description

non-essential config file

Supported formats are TrnsorRT, TensorFLOW saved-model, ONNX models do not require config.pbtxt when --strict-model-config=false

In config, platfrom can fill in Tensorrt_plan, onnxruntime_onnx, pytorch_libtorch
backend tensorrt, onnxruntime, pytorch

dims: [ 3,-1,-1 ] -1 represents variable dimension

–model-control-model explicit

git clone https://github.com/NVIDIA/DeepLearningExamples.git

cd data/squad/

Download dataset
sh squad_download.sh

Model mapping file address

cd /models

Download address of demo model

https://catalog.ngc.nvidia.com/orgs/nvidia/models/bert_pyt_ckpt_large_qa_squad11_amp

#Paste wget command
wget --content-disposition https://api.ngc.nvidia.com/v2/models/nvidia/bert_pyt_ckpt_large_qa_squad11_amp/versions/19.09.0/zip -O bert_pyt_ckpt_large_qa_squad11_amp_19.09.0.zip

convert format

python3 triton/export_model.py
–input-path triton/model.py
–input-type pyt
–output-path $/models/exported_model.onnx
–output-type onnx
–dataloader triton/dataloader.py
–ignore-unknown-parameters
–onnx-opset 13
${FLAG}

–config-file bert_configs/large.json
–checkpoint /models/bert_large_qa.pt
–precision fp16

–vocab-file /models/vocab.txt
–max-seq-length 34
–predict-file /opt/tritonserver/DeepLearningExamples/PyTorch/LanguageModeling/BERT/data/squad/v1.1/dev-v1.1.json
–batch-size 16

Guess you like

Origin blog.csdn.net/dream_home8407/article/details/131772301