1. Use of tritonserver image
1) Pull the image
# <xx.yy>为Triton的版本
docker pull nvcr.io/nvidia/tritonserver:22.06-py3
2) Start the container
When specifying the model warehouse, you can execute ./fetch_model.sh under the server, see section 2.2
Launch of the GPU version
docker run --gpus=1 --rm -p8000:8000 -p8001:8001 -p8002:8002 -v /home/zhouquanwei/workspace/triton/server/docs/examples/model_repository:/models nvcr.io/nvidia/tritonserver:22.06-py3 tritonserver --model-repository=/models
Start of the CPU version
docker run --rm -p8000:8000 -p8001:8001 -p8002:8002 -v /home/zhouquanwei/workspace/triton/server/docs/examples/model_repository:/models nvcr.io/nvidia/tritonserver:22.06-py3 tritonserver --model-repository=/models
The only difference is the --gpus=1 parameter
Note: The version before docker19.03 needs to specify the hardware name of the graphics card when using gpu, and the version after docker19.03 needs to be installed
nvidia-container-toolkit或nvidia-container-runtime
My server is centos system, I installed nvidia-container-toolkit in the following way:
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.repo | sudo tee /etc/yum.repos.d/nvidia-docker.repo
sudo yum install -y nvidia-container-toolkit
Restart docker after installation
systemctl restart docker
Check whether the gpus parameters are installed successfully
docker run --help | grep -i gpus
--gpus gpu-request GPU devices to add to the container ('all' to pass all GPUs)
Re-executed I encountered the following error
Use the non-GPU version first
nvidia-docker2.0 is a simple package, which mainly allows docker to use NVIDIA Container runtime by modifying the docker configuration file "/etc/docker/daemon.json".
After successful execution
The way to enter this container is
docker exec -it 8f89d733ff41 /opt/nvidia/nvidia_entrypoint.sh
3) Verify whether the startup is successful
curl -v localhost:8000/v2/health/ready
4) Continue to verify and send a request
##拉取镜像
docker pull nvcr.io/nvidia/tritonserver:22.06-py3-sdk
##启动服务
docker run -it --rm --net=host nvcr.io/nvidia/tritonserver:22.06-py3-sdk
##发送请求
/workspace/install/bin/image_client -m densenet_onnx -c 3 -s INCEPTION /workspace/images/mug.jpg
Request 0, batch size 1
Image '/workspace/images/mug.jpg':
15.346230 (504) = COFFEE MUG
13.224326 (968) = CUP
10.422965 (505) = COFFEEPOT
Notice
1) For the nvcr.io/nvidia/tritonserver:22.06-py3 image, the startup program of tritonserver is saved in the /opt/tritonserver/bin directory in the container, use
tritonserver --model-repository=/models
2) The meaning of each file in the /opt/tritonserver directory
/opt/tritonserver/bin: tritonserver executable
/opt/tritonserver/lib: stores shared libraries
/opt/tritonserver/backends: store backends
/opt/tritonserver/repoagents:存放repoagents
2. Compile tritonserver
Triton inference server supports source code compilation and container compilation
2.1 Source code compilation
2.2 Container compilation
1) Clone triton inference server
cd /workspace/triton
git clone --recursive [email protected]:triton-inference-server/server.git
2) Create a model warehouse
cd /workspace/triton/server/docs/examples
./fetch_models.sh
After executing the ./fetch_models.sh script, it is found that there is an additional 1 directory under the /workspace/triton/server/docs/examples/model_repository/densenet_onnx directory.
There is an additional inception_v3_2016_08_28_frozen.pb.tar.gz in the /tmp directory
3) Compile, container compile and source code compile
cd server
./build.py -v --enable-all