OpenSitUp open source project: zero-based development of sports and fitness APP based on pose estimation

For more deep learning engineering practice projects, please pay attention to the official account: DL Engineering Practice

 

1. Project open source address

https://github.com/DL-Practise/OpenSitUp

2. Project introduction

There is an application branch in computer vision called pose estimation, which estimates the pose information of one or more people through the key points of the human body. As shown below:

OpenSitUp is an open source project based on posture estimation, which aims to help friends who are interested in posture estimation to build a sit-up counting APP running on an android phone from scratch. The main technical difficulty is how to make the human body posture estimation network with a large amount of calculation run smoothly on the mobile phone, and realize the counting function of sit-ups. After mastering the principle of this project, it can be easily migrated to similar sports and fitness apps.

3. Display of project results

The following shows the final APP effect of this project. In the crowded West Lake scenic area, I lay down and do sit-ups in public, which makes me ashamed!

 

4. Project directory

Since the sit-up counting APP needs to be developed from scratch, the entire project needs to include multiple projects, including data collection, labeling, training, deployment, app development, etc. The overall directory structure is shown in the figure below:

 

4.1 DataSet

The directory where the data set is stored. Here I pre-placed more than 300 marked pictures. These pictures can already be used to train the effects shown in the "Project Results Display". But for better performance, you can take more pictures of sit-ups.

4.2 Label Tool

Here is a marking tool suitable for this project, which mainly marks some key points of the human body. After you have collected a lot of pictures of sit-ups, you can use this tool to mark them and generate corresponding labels.

4.3 Trainer

This is a key point training tool based on pytorch, which contains a lightweight key point detection network designed for mobile phones.

4.4 SiteUpAndroid

Sit up counting APP on Android.

5. Project process:

5.1 Capture pictures

Since there is no ready-made sit-up data set, I can only do it myself and have enough food and clothing. Fortunately, there are still many related resources on the Internet for routine exercises such as sit-ups. Here I use two ways to download videos and pictures. First search for "sit-ups" videos from the Internet, download about 10 video clips, and then extract a part of frames from each video as training data by means of frame extraction. The keyframes extracted from the video are shown in the figure below.

Only using the frames extracted from the video will have a more serious problem, that is, the background is too single, which can easily cause overfitting. So I searched for pictures from the Internet, and got a picture with a richer background, as shown in the figure below:

 

5.2 Annotate pictures

After collecting the data, it’s time to label. Although there are some ready-made open source labeling tools, I can’t use them smoothly, so I developed a key point labeling tool, which is the open source LabelTool above. After all, I developed it myself and used it. smoothly. Note that this tool has been tested in the win10/python 3.6 environment, and has not been tested in other environments. Use the command python main_widget.py to open the interface. The initial interface is very simple, and the collected sit-up pictures can be opened through the "Open" button.

 

Use 0 in the category to indicate the head is marked, 1 means the knee is marked, and 2 means the crotch is marked (because we need to identify sit-ups on the mobile phone, we need to reduce the amount of calculation as much as possible, and the pose estimation generally will Predict many key points of the whole body, but for sit-ups, as long as the head, knees and crotch can be accurately predicted, the action recognition of sit-ups can be better performed, so only three points need to be marked here). Click the left button of the mouse to mark, right click to cancel the previous mark. I have to say that it is very convenient to develop some UI-based tools with python+qt! Compared with C++, too much productivity is liberated!

After marking the picture, a label file label.txt will be generated under the picture directory, the content of which is as follows:

     

5.3 Algorithm principle

Let me first briefly introduce the algorithm principle of sit-ups. In the field of pose estimation (key point detection), regression is rarely used to predict the position of key points. Instead, heatmap is used to output the position of key points. This is similar to the centness in the target detection of anchor free, that is, the coordinates of the key points are determined by finding the point with the largest response value in the heatmap. As shown in the following figure (only part of the heatmap is displayed):

  

After thinking about the reason, the coordinates are returned directly. Usually, the final featuremap is down-sampled to a small size, so that the global regression can be achieved. However, the task of key point prediction is very sensitive to position information, and features that are too small will be greatly affected. Spatial information is lost, resulting in very inaccurate predicted positions. The heatmap method generally requires that the final feature map is relatively large, usually 1/2 or 1/4 of the input image, so it is very suitable for some space-related tasks. In fact, if the feature map is artificially compressed to a small size, the heatmap method is also not accurate. With the above thinking, there is a final solution, which is to upsample the 7*7 feature map finally output by shufflenet to a size of 3*56*56 (considering the final application and scene, 56*56 is enough to realize supine Recognition of sitting ups), 3 means 3 key points. Then the output features are activated by sigmoid to get 3*56*56 heatmaps. Two more points to mention here are the design of the heatmap label and the balance of loss. Let me talk about the label design first. If you simply convert the label into a one_hot heatmap, the effect will not be very good. Because the points attached to the label point are actually similar to the extracted features for the network, if you force the points that are not near the label to be set to 0, the performance will not be very good. Generally, the Gaussian distribution is used to make the label heatmap, as shown in the figure below Shown:

Another thing to talk about is the balance of loss. You can see the label heatmap above. Whether it is a one-hot heatmap or a Gaussian heatmap, most of the points are negative sample points. MSE is used directly without distinction. The network basically trains a heatmap whose outputs are all 0s. The main reason is that the gradient of training is suppressed by negative samples, and the gradient of positive samples is too small. Therefore a partition is required. Here I set the ratio of positive and negative samples to 10:1.

5.3 Trainer training tools

The Trainer tool mainly includes four parts:

cfg: configuration file directory

data: data reading directory

DLEngine: training engine

models: network model directory

First, in the keypoint directory under models, I implemented the shufflenet-based key point detection network discussed above, ShuffleNetV2HeatMap, and then implemented a dataset reading tool for reading label files marked by LabelTool in the data directory: person_keypoint_txt.py. Finally, the configuration file for this project is implemented in the key_point directory under the configuration folder cfgs: keypoint_shufflenetv2_heatmap_224_1.0_3kps.py, the main fields contained in it are as follows:

 

Before starting the training, modify the CFG_FILE in the train.py file to the above configuration file:

CFG_FILE='cfgs/key_point/keypoint_shufflenetv2_heatmap_224_1.0_3kps.py'. Start training with the command python train.py.

5.4 Conversion model

After the training is completed in Trainer, the corresponding model file will be generated under the save directory. However, these pytorch models cannot be directly deployed to mobile phones to run, and corresponding inference libraries need to be used. There are currently many open source reasoning libraries, such as mnn, ncnn, tnn, etc. Here I choose to use ncnn, because ncnn is open source early, many people use it, and the network support and hardware support are not bad. Unfortunately, ncnn does not support importing the pytorch model directly. It needs to be converted into onnx format first, and then import the onnx format into ncnn. Also note that there are many glue ops after the pytroch model is transferred to onnx, which is not supported in ncnn. You need to use another open source tool: onnx-simplifier to cut the onnx model and then import it into ncnn. Therefore, the whole process is still a bit cumbersome. For this reason, I wrote the export_ncnn.py script in the Trainer project , which can convert the trained pytorch model into an ncnn model with one click. After the conversion is successful, three ncnn-related files will be generated in the pytorch model folder under the save directory: model.param; model.bin and ncnn.cfg.

5.5 APP development

Android APP development mainly includes an Activity class, two SurfaceView classes, an Alg class, and a Camera class. The Alg class is mainly responsible for invoking the algorithm for reasoning and returning the result. This is actually the reasoning function of the NCNN library called. The Camera class is mainly responsible for opening and closing the camera, and performing preview callbacks. The first SurfaceView (DisplayView) is mainly used to display the camera preview. The second SurfaceView (CustomView) is mainly used to draw some key point information, count statistics, etc. Activity is a management class at the top, responsible for managing the entire APP, including creating buttons, creating SurfaceView, creating Alg classes, creating Camera classes, etc.

 

For specific code logic, you can view the SiteUpAndroid source code.

Guess you like

Origin blog.csdn.net/cjnewstar111/article/details/119617203