1. Use triton image

1. Use of tritonserver image

1) Pull the image

# <xx.yy>为Triton的版本
docker pull nvcr.io/nvidia/tritonserver:22.06-py3


2) Start the container

When specifying the model warehouse, you can execute ./fetch_model.sh under the server, see section 2.2

Launch of the GPU version

docker run --gpus=1 --rm -p8000:8000 -p8001:8001 -p8002:8002 -v /home/zhouquanwei/workspace/triton/server/docs/examples/model_repository:/models nvcr.io/nvidia/tritonserver:22.06-py3 tritonserver --model-repository=/models

Start of the CPU version

docker run --rm -p8000:8000 -p8001:8001 -p8002:8002 -v /home/zhouquanwei/workspace/triton/server/docs/examples/model_repository:/models nvcr.io/nvidia/tritonserver:22.06-py3 tritonserver --model-repository=/models

The only difference is the --gpus=1 parameter

Note: The version before docker19.03 needs to specify the hardware name of the graphics card when using gpu, and the version after docker19.03 needs to be installed

nvidia-container-toolkit或nvidia-container-runtime

My server is centos system, I installed nvidia-container-toolkit in the following way:

distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.repo | sudo tee /etc/yum.repos.d/nvidia-docker.repo
sudo yum install -y nvidia-container-toolkit

Restart docker after installation

systemctl restart docker

Check whether the gpus parameters are installed successfully

docker run --help | grep -i gpus
      --gpus gpu-request               GPU devices to add to the container ('all' to pass all GPUs)

Re-executed I encountered the following error

 Use the non-GPU version first

nvidia-docker2.0 is a simple package, which mainly allows docker to use NVIDIA Container runtime by modifying the docker configuration file "/etc/docker/daemon.json".

After successful execution

 The way to enter this container is

docker exec -it  8f89d733ff41 /opt/nvidia/nvidia_entrypoint.sh

3) Verify whether the startup is successful

curl -v localhost:8000/v2/health/ready

 4) Continue to verify and send a request

##拉取镜像
docker pull nvcr.io/nvidia/tritonserver:22.06-py3-sdk

##启动服务
docker run -it --rm --net=host nvcr.io/nvidia/tritonserver:22.06-py3-sdk

##发送请求
/workspace/install/bin/image_client -m densenet_onnx -c 3 -s INCEPTION /workspace/images/mug.jpg
Request 0, batch size 1
Image '/workspace/images/mug.jpg':
    15.346230 (504) = COFFEE MUG
    13.224326 (968) = CUP
    10.422965 (505) = COFFEEPOT

 Notice

1) For the nvcr.io/nvidia/tritonserver:22.06-py3 image, the startup program of tritonserver is saved in the /opt/tritonserver/bin directory in the container, use

tritonserver --model-repository=/models

 2) The meaning of each file in the /opt/tritonserver directory

/opt/tritonserver/bin: tritonserver executable

/opt/tritonserver/lib: stores shared libraries

/opt/tritonserver/backends: store backends

/opt/tritonserver/repoagents:存放repoagents

2. Compile tritonserver

Triton inference server supports source code compilation and container compilation

2.1 Source code compilation

2.2 Container compilation

1) Clone triton inference server

cd /workspace/triton
git clone --recursive [email protected]:triton-inference-server/server.git

2) Create a model warehouse

cd /workspace/triton/server/docs/examples
./fetch_models.sh

 

After executing the ./fetch_models.sh script, it is found that there is an additional 1 directory under the /workspace/triton/server/docs/examples/model_repository/densenet_onnx directory.

There is an additional inception_v3_2016_08_28_frozen.pb.tar.gz in the /tmp directory 

3) Compile, container compile and source code compile

cd server
./build.py -v --enable-all

 

Guess you like

Origin blog.csdn.net/qq_38196982/article/details/127231512