ppTSM (Paddle temporal-shift-module) training deployment records

Video classification algorithm ppTSM use record

Introduction

  • Similar to image classification, video classification is a recognition task. For a given input video, the video classification model needs to output its predicted label category. If the labels are all action categories, the task is also often called action recognition. Different from image classification, video classification often needs to utilize timing information between multiple frames of images. PP-TSM is a practical industry-level video classification model developed by PaddleVideo. On the basis of implementing cutting-edge algorithms, it considers the balance between accuracy and speed, and performs model slimming and accuracy optimization to make it possible to meet the needs of the industry.

1. Resource preparation

Here I mainly use PaddleVideo's ppTSM for development

2. Model training

  1. Data set preparation:
    just prepare your own data according to the ucf101 format, ucf101 download link: https://github.com/PaddlePaddle/PaddleVideo/blob/develop/docs/zh-CN/model_zoo/recognition/pp-tsm.md
  2. Training command line:
    python -B -m paddle.distributed.launch --gpus="0,1,2,3" --log_dir=log_pptsm main.py --validate -c ./configs/recognition/pptsm/v2/pptsm_lcnet_ucf101_16frames_uniform.yaml --amp
  3. Model export:
    python tools/export_model.py -c configs/recognition/pptsm/v2/pptsm_lcnet_ucf101_16frames_uniform.yaml -p output/ppTSMv2/ppTSMv2_best.pdparams -o inference/PPTSMv2
  4. Model prediction, verification model accuracy:
    python tools/predict.py --input_file data/example.avi --config configs/recognition/pptsm/v2/pptsm_lcnet_k400_16frames_uniform.yaml --model_file weights/ppTSMv2.pdmodel --params_file weights/ppTSMv2.pdiparams --use_gpu=True
  5. paddle2onnx:
    paddle2onnx --model_dir=./weights/ucf101 --model_filename=ppTSMv2.pdmodel --params_filename=ppTSMv2.pdiparams --save_file=./weights/ucf101/ppTSMv2/pptsmv2.onnx --opset_version=11 --enable_onnx_checker=True
  • Reference link for the above 5 steps: https://github.com/PaddlePaddle/PaddleVideo/blob/develop/docs/zh-CN/model_zoo/recognition/pp-tsm.md

3. Algorithm deployment

Scheme 1 paddle_inference:

This paddle_inference backend supports tensorrt and onnx, CPU side and GPU side
https://github.com/PaddlePaddle/PaddleVideo/tree/develop/deploy/cpp_infer

Scheme 2 tensorrt (focus of this article)

I choose tensorrt, because other algorithms are deployed with tensorrt, if adding a paddle_inference will appear too messy, see my GitHub for detailed implementation:

4. Precision alignment

In order to ensure that the accuracy is not lost after model conversion, it is necessary to compare the three-party inference results of paddle, onnx, and tensorrt: detailed comparisons can be made with predict.py, onnx_infer.py, and C++ output results. The pitfalls encountered during the development process are mainly the results of onnx and tensorrt Inconsistency. After a little bit of debugging, the results of the two are finally consistent. After confirming that there is no problem with the deployment, it is the training and tuning of the algorithm model.

V. Summary

I have been using traditional image processing algorithms (Gaussian mixture model of opencv, inter-frame difference method) for video classification and recognition. After switching to deep learning methods, the versatility of the accuracy rate has improved a lot, but more training data is needed. to increase the robustness of the algorithm.

Guess you like

Origin blog.csdn.net/zengwubbb/article/details/129782161