I. Introduction
Recently, the blogger is sorting out the source code of yolov5. I believe that this piece will be able to meet with you soon. At the same time, I also think that this series should also be accompanied by a bit of actual combat content. In particular, there are really too few end-to-end actual combat articles, most of which are about the training process, but yolov5 is more to be used end-to-end, such as Android phones or ios phones.
This tutorial is to complete such a thing, starting from the data set to fully explain the detailed process of training + deployment, I believe that everyone can have a clearer understanding of the complete project process as long as you follow it. At the same time, in order to read lively and interestingly, I am planning to make a little raccoon recognition app here and put it on my Android phone to recognize pictures of little raccoons.
2. Environment and data preparation
surroundings:
- GPU server (not very good, very small amount of data)
- Develop laptop
- Android or IOS mobile phone, used to install App
- Software: Android Studio, FileZilla, Xshell
Use watch -n 0.1 nvidia-smi to dynamically view the graphics card situation:
data:
A good data website is recommended here, and various common target detection data sets can be downloaded here:
We are using this Raccoon little raccoon dataset, which contains 196 images.
On the download page, I chose a 416*416 resize image. Directly select the label txt format that yolov5 can recognize.
After downloading the data set, divide the train/val data set. I am here at 150:46.
After preparing the data, we can upload it to the GPU server.
Framework code:
Clone the source code to the server.
Then install requiremens.txt, yolov5 4.0 version requirements are as follows:
# base ----------------------------------------
matplotlib>=3.2.2
numpy>=1.18.5
opencv-python>=4.1.2
Pillow
PyYAML>=5.3.1
scipy>=1.4.1
torch>=1.7.0
torchvision>=0.8.1
tqdm>=4.41.0
# logging -------------------------------------
tensorboard>=2.4.1
# wandb
# plotting ------------------------------------
seaborn>=0.11.0
pandas
# export --------------------------------------
# coremltools>=4.1
# onnx>=1.8.1
# scikit-learn==0.19.2 # for coreml quantization
# extras --------------------------------------
thop # FLOPS computation
pycocotools>=2.0 # COCO mAP
It is recommended to use Tsinghua source or other domestic third-party sources for installation dependencies.
In addition, the pre-training weights of yolov5 can be uploaded directly to the weights directory through FileZilla.
After installation, execute python detect.py, and encounter the following problems:
traceback (most recent call last):
File "detect.py", line 5, in <module>
import cv2
File "/root/anaconda3/envs/python367/lib/python3.6/site-packages/cv2/__init__.py", line 5, in <module>
from .cv2 import *
ImportError: libGL.so.1: cannot open shared object file: No such file or directory
Solution:
sudo apt update
sudo apt install libgl1-mesa-glx
The detect is successful, indicating that the deployment of yolov5 is successful.
The entire directory is shown in the figure:
3. Training
- Training command
python train.py --data ./Raccoon/data.yaml --cfg models/yolov5s.yaml --weights 'weights/yolov5s.pt' --batch-size 48 --multi-scale --deivce 0,1 --sync-bn --epochs 200
Parameter meaning
1. --data:数据的配置文件,里面包含train、val的images和labels的位置。
2. --cfg:model的配置文件,用于动态加载模型。
3. --weights:预训练权重,用于transfer learning。
4. --batch-size:批大小。
5. --multi-scale:多尺度训练,开启后每个batch自动缩放图像+-50%。
6. --device:使用的GPU服务器,我这里是2块2080Ti的卡。所以填0,1,单卡可以不填。
7. --sync-bn:同步bn,单卡可以不填。
8. --epochs:迭代轮数
Training for 200 epochs, the validation set [email protected] has 0.875.
Analysis results
In the "runs/train" folder, the results generated by default are stored. Let's take a look:
Original test label:
Our prediction:
It was found that most of the predictions were okay, and there were some cases of missing labels and wrong labels.
Learning curves of overall training:
PR curve situation:
Confusion matrix:
Here we only use it as a demonstration, the increase in accuracy, please try it yourself.
Get the final parameter result best.pt. We will deploy him to the end later.
Four. Deployment
Now we have got the .pt model of pytorch, and now we are going to put it on the end, the blogger here is using a oneplus8T mobile phone. The conversion idea of putting the overall pytorch model on Android is pt=>onnx=>ncnn. So let's step by step:
1. Install onnx dependent libraries
pip install onnx coremltools onnx-simplifier
Successful installation:
Successfully installed attr-0.3.1 attrs-20.3.0 coremltools-4.1 mpmath-1.2.1 onnx-1.8.1 onnx-simplifier-0.3.3 onnxoptimizer-0.2.5 onnxruntime-1.7.0 packaging-20.9 sympy-1.7.1
2. Convert the best.pt file that has just been trained to an onnx file.
python models/export.py --weights weights/runs/train/ex8/weights/best.pt
Got the best.onnx file.
(python367) root@1bd129ef64d3:/usr/cx/yolov5-master# ls runs/train/exp8/weights/
best.mlmodel best.onnx best.pt best.torchscript.pt last.pt
3. Use onnx-simplier to simplify the model
python -m onnxsim best.onnx best-sim.onnx
Get a simplified model: best-sim.onnx.
4. Use ncnn tool to convert .onnx to .bin file
Here need to compile ncnn, the compilation process is given below. Of course, you can also use the prebuild package without compiling:
- Prepare the compilation environment:
sudo apt install build-essential libopencv-dev cmake
- Compile protobuf
git clone https://github.com/protocolbuffers/protobuf.git
cd protobuf
git submodule update --init --recursive
./autogen.sh
./configure
make
make install
sudo ldconfig
- Compile ncnn, generate onnx2ncnn tool
git clone https://github.com/Tencent/ncnn.git
cd ncnn
git submodule update --init
mkdir build
cd build
cmake ..
make -j8
make install
Everyone knows the pain of compilation, so you can also use the
prebuild installation package: ncnn historical releases: https://github.com/Tencent/ncnn/releases
Everyone corresponds to your own linux os version, here is ubuntu18.
Download to the server and get the onnx2ncnn conversion tool:
- Convert .onnx to .bin file
/usr/cx/ncnn-20210322-ubuntu-1804/bin/onnx2ncnn best-sim.onnx yolov5s.param yolov5s.bin
Report the following error:
Unsupported slice step !
Unsupported slice step !
Unsupported slice step !
Unsupported slice step !
Unsupported slice step !
Unsupported slice step !
Unsupported slice step !
Unsupported slice step !
The problem here is that onnx does not support slice operations and needs to be modified to Focus:
We visualize the yolov5s.param file just generated at the following URL:
Model visualization
See our model looks like this:
Here we modify the yolov5s.param file accordingly: the
original is as follows: the 10 layers from Split to Concat need to be replaced with the Focus layer:
7767517
185 212
Input images 0 1 images
Split splitncnn_input0 1 4 images images_splitncnn_0 images_splitncnn_1 images_splitncnn_2 images_splitncnn_3
Crop Slice_4 1 1 images_splitncnn_3 131 -23309=1,0 -23310=1,2147483647 -23311=1,1
Crop Slice_9 1 1 131 136 -23309=1,0 -23310=1,2147483647 -23311=1,2
Crop Slice_14 1 1 images_splitncnn_2 141 -23309=1,1 -23310=1,2147483647 -23311=1,1
Crop Slice_19 1 1 141 146 -23309=1,0 -23310=1,2147483647 -23311=1,2
Crop Slice_24 1 1 images_splitncnn_1 151 -23309=1,0 -23310=1,2147483647 -23311=1,1
Crop Slice_29 1 1 151 156 -23309=1,1 -23310=1,2147483647 -23311=1,2
Crop Slice_34 1 1 images_splitncnn_0 161 -23309=1,1 -23310=1,2147483647 -23311=1,1
Crop Slice_39 1 1 161 166 -23309=1,1 -23310=1,2147483647 -23311=1,2
Concat Concat_40 4 1 136 146 156 166 167 0=0
Convolution Conv_41 1 1 167 168 0=32 1=3 11=3 2=1 12=1 3=1 13=1 4=1 14=1 15=1 16=1 5=1 6=3456
After replacement:
Note that only spaces can be used in the middle, not the tab key, otherwise the app will crash later.
Correspondence also needs to modify the total number of layers, 185 to 176, because we have reduced 9 layers.
Note that after the change, you should ensure that Netron's presentation is coherent:
5. Model compression, half-precision conversion
ncnnoptimize yolov5s.param yolov5s.bin yolov5s-opt.param yolov5s-opt.bin 65536
Get the final yolov5s-opt.param and yolov5s-opt.bin, we renamed them yolov5s.param and yolov5s.bin.
6. Use android studio to package
1. Download the example source code
We perform packaging testing based on the official example of ncnn:
The example address is as follows:
We download this example code:
https://github.com/nihui/ncnn-android-yolov5.git
After decompression, open it with android studio and use gradle to automatically build the project.
2. Download the dependencies of the ncnn Android development environment
Open ncnn's releases URL
Download the Android version as shown in the figure. Vulkan is an accelerated version.
After decompression, four folders are obtained:
Copy them to the corresponding location of the example code just now (app/src/main/jni/):
3. Modify the file
- yolov5s.param
Need to modify the output of three Reshape:
Modify all to 0=-1, and automatically determine the output size. If you don't modify this and run the small picture directly, you will find that the detection frame is densely covered with the entire screen, or nothing can be detected at all.
- CMakeLists.txt
then modify the CMakeLists.txt in the same level directory, and modify the value of the ncnn_DIR variable to:
set(ncnn_DIR ${
CMAKE_SOURCE_DIR}/${
ANDROID_ABI}/lib/cmake/ncnn)
- Modify yolov5ncnn_jni.cpp to
search for the "output" keyword and locate several FPN outputs:
here need to be filled in according to the actual number in yolov5s.param, search for Permute keyword in yolov5s.param:
the real output (red box) Fill in the value in the output position just now.
Finally, comment out the label of the coco data set and change it to ours. We only have one "raccon" class:
7. Install Apk to test the effect
Connect the phone with the data cable and turn on the developer mode.
Click the green button in android studio to install the apk. At the same time, android studio will print debugging information.
At the same time, the apk is generated on the phone:
We click "Select Picture" to choose a picture of a little raccoon.
Then use GPU or CPU recognition and throw the master ball:
Simply noodles, oh no, the little raccoon was caught by the master ball! !
The client prints the information, and the GPU recognition on my mobile phone is about 100ms.
8. Conclusion
Through this tutorial, we have completed a model based on yolov5 training, and then to the complete process of Android apk deployment. Later we will continue to dive into the source code of yolov5.
Finally, I would like to thank the bloggers who helped me and gain knowledge during the process:
Reference article material 1
Reference article material 2
If everyone feels rewarded, please like and follow and support the blogger!