Somatosensory Tetris, CPU can run, all open source

Somatosensory Tetris game effect

1. Background

As an elderly ape among programmers, I am eager to exercise and exercise every day. However, due to the limitations of actual conditions, I can't go outside to exercise every day. The next best thing is to do some somatosensory games at home. The hottest thing right now is Ren Moutang's fitness ring. I also went to the offline store to experience it. The effect, picture, and recognition accuracy are very good, but the price is not very friendly. For a "middle-aged greasy programmer" like me, it is too expensive! So I thought about whether I could use pose estimation to make a somatosensory game by myself. After a period of conception, I think it is completely feasible!

2. Goal

Able to play the Tetris game through somatosensory gestures, and other small games can be added in the future

Recognition must be accurate and fast, and ordinary CPUs can run

The control posture can be adjusted according to your own needs

3. Interface design

It is mainly divided into two parts, the left side is the posture configuration and game interface, and the right side is the interface related to camera detection. In the posture configuration interface, the four postures required to control Tetris are displayed, namely, move one square to the left, move one square to the right, transform and fast down. The pose can be modified with the mouse, and key bones can be selected. The key bones mean that we only need to judge that the key bones in our pose are consistent with the key bones in the configuration interface, and the pose can be considered to be consistent. For example, in the gesture of moving left, it is only necessary to judge the left hand and the right hand. As long as the gestures of the left hand and right hand are close to the configured gesture, it is considered that the left movement action is triggered. There is no need to judge the body, the bones of the legs. This not only facilitates calculation, improves efficiency, but also increases accuracy.

 

4. Pose Estimation

For the somatosensory Tetris, it is not difficult to develop a Tetris game alone, but the pose estimation network is difficult. It needs to be accurate and low in calculation (CPU can run it), so it poses a great challenge to network design.

4.1 Introduction to Pose Estimation Networks

The current attitude estimation can be roughly divided into three categories: the first category: top-down attitude estimation, this attitude estimation network first locates the person through the target detection network, and then sends the person into the single-person attitude estimation network, because There are two stages, the speed is of course very slow, but the relative accuracy will be higher. The second category: the bottom-up pose estimation network, first estimates all the key points of all people, and then through post-processing matching, the keys of different people are grouped to obtain the key points of each person. This post-processing process is really uncomfortable, adding a complicated post-processing to the end-to-end deep learning network, and this post-processing is still time-consuming. The third category: centernet , which is the object as point paper (essentially also belongs to bottom-up attitude estimation, but it is relatively special, so it is classified into a separate category). Using the anchor free network in target detection for pose estimation does not need to be divided into two stages like top-down , nor does it require complicated post-processing like bottom-up to distinguish different people, because it directly outputs the human body The center point of the instance, and the offset of the center point relative to each key point of this human instance. Therefore, the process is simple and efficient.

4.2 Lightweight Pose Estimation Network

For the first type of pose estimation network (top-down), it is very difficult to reduce weight, because there are two serial networks, which are generally used in scenarios that are not sensitive to the amount of calculation but require high precision. The second type of pose estimation network can optimize the backbone and post-processing to achieve lightweight. The most typical representative is lightweight openpose . As for the third category, centernet , lightweighting is relatively much simpler, because the network itself is not complicated, and there is no heavy post-processing stage, so the optimization technology around the network itself can achieve better optimization results. Therefore, the general idea is to lightweight based on the third type of attitude estimation network ( centernet ). Why not optimize based on the second type of network? For example , lightweight openpose is mainly because our project can run perfectly as long as a single person estimates it. The post-processing of lightweight openpos is relatively heavy (a large part of the calculation is on matching the human body), and the centernet pose estimation based on the third type is naturally Classification of each human body instance does not require heavy post-processing, so it is relatively simple to optimize.

4.3 Final solution

At the beginning, I reused the posture estimation network of the previous OpenSitUp project, which implemented a sit-up counting function on the mobile phone. The network uses mobilenet as the backbone network, and finally upsamples the feature map to predict the three key points of the head, knee, and hip. So I directly modified the network to predict 13 key points such as the head, shoulders, elbows, and wrists. Then train on coco and mpii . Since my notebook graphics card is MX250 with 2G memory, the training is really slow. I only trained more than 10 epoochs in a week . When I was worrying about the training speed, I saw Google released movenet , a good thing, is aimed at the attitude estimation network of the mobile terminal, which is modified based on centernet . The backbone uses mobilenet , which has good speed and high precision. The key is that it also provides a pre-trained model. Then, it is better to use movenet directly ! ! ! I am delighted to have the endorsement of a big factory, and it also saves the cumbersome training and parameter adjustment .

4.4 Principle of movenet

Of course, movenet has no open source, no papers, only blogs, we still need to analyze why the structure and process of movenet

4.5 movenet structure

After the input image is passed through mobilenet ( 5 times of downsampling), it is then subjected to 3 times of upsampling, so that the size of the feature map is 1/4 of the original image, and finally four tensors are output through four branches , which are the center point tensor and the key point Return tensor , key point heatmap and offset.

 

4.6 movenet process

The final process is shown in the figure below. First, the center point of the human body is found through the center point tensor. Since there are multiple center points, we search for the midpoint closest to the center of the picture ( step1 ). Then on the basis of the center point, according to the regression vector, you can find the position of all key points relative to the center ( step2 ). In fact, all the key points have been found at this time, but this key point is relatively rough because they all pass through the center point The receptive field of the attachment is predicted to come. So step3 is needed , combined with the heatmap of the key points , to find the best key points. The final step4 is due to the prediction we made on the feature map of the original image /4 . After increasing the offset , we can restore the key points in the original image and further correct the key points.

 

4.7 Pose comparison

How to compare the recognized pose with the preset pose to get a pose score? I used the angle matching algorithm of the key bones.

First of all, for the standard posture, such as the standard posture of the "move left" command, it is set to raise the left hand, as shown on the left side of the figure below. For the convenience of calculation, we set four key bones, as shown in the red part. The middle is the actual pose captured by the camera. So how to calculate the similarity between the two poses, the method is to compare the angles of the key bones, as shown on the right side of the figure below. The final similarity is ?

 

5. Installation method (Win)

5.1 Install the pytorch environment

Those who have already installed can skip this step, and those who have not installed can refer to the video tutorial:

Start from scratch on windows, install the pytorch-cuda environment offline, provide all installation packages, it is recommended to collect!

5.2 Download PoseTeris source code

Download source code from github address clone or direct browser

GitHub - DL-Practise/PoseTetris: A somatosensory Tetris game

Install the dependent environment, enter the root directory of PoseTeris, open the command line (powershell), execute

pip install -r requirment.txt

5.3 Start Pose Tetris

Enter the root directory of PoseTeris, open the command line (powershell), execute python main_widget.py to start the program

5.4 Configure posture

After opening the program, the default is to configure the attitude interface. Tetris requires a total of four commands, move left, move right, transform, and fast down. By default, there are already four postures corresponding to the command. Move left is to raise the left hand, move right is to raise the right hand, transform is to raise both hands at the same time, and quickly go down is to squat (note that the knees need to be outward). If you need to modify the posture, you can use the mouse to move to the joint point, click the left mouse button and drag. At the same time, you can also modify the key bones (see the role of the key bones: pose comparison), move the mouse to the middle point of the bone, click the left button, the bone turns red, indicating that the bone has been set as a key bone, and click the right mouse button to cancel Set as key bone. Eventually, key bones are indicated in red. If you want to restore the default attitude control, click the attitude reset button.

Somatosensory Tetris posture configuration

5.5 Start the game

Click to enter the game interface, and click the start button on the right. Then the person retreats to a distance where the camera can capture the entire person. At this point, you can happily enjoy the somatosensory Tetris, I wish you happy!

 

Guess you like

Origin blog.csdn.net/cjnewstar111/article/details/121191485