Practical combat - OPenPose explanation and code implementation

some premises

Consider the following questions first;

1. What is attitude estimation?

Reference: Point Detect task, identifying key points of specified parts of the human body;

2. What are the difficulties in attitude estimation?

From the perspective of interference, the human body being blocked has a great impact on detection;

How to match and splice the detected points in order and ensure that they are the same person;

How to improve performance so that the algorithm can reason in real time;

3. What are the key points of the COCO data set? What are they?

Insert image description here

There are 17 points in the data set. In fact, a neck point is added during training, which is calculated from the two shoulder points;

4. What are the application fields?

Sports field: such as fitness action recognition;
Safety field: preventing falls on road surfaces and drowning in swimming pools;
Special needs: For example, whether to wear an item can be based on key points and the positioning of the object;

5. How many categories are attitude estimation methods divided into?

Top-down: first detect all people, and then estimate the posture of each person in the frame and output the results

Advantages: high accuracy and high point regression rate;

Disadvantages: The performance of the algorithm depends on the detection effect, the complexity is high, and the real-time performance is relatively poor;

Mainly used for some offline projects, no real-time requirements! ! !
Bottom-up (bottom-up): first detect all key points, and then perform matching connections

Advantages: The amount of calculation is small and real-time effects can be achieved;

Disadvantages: Poor accuracy and complex matching strategy;

Paper interpretation

Paper address: https://arxiv.org/pdf/1611.08050.pdf

Reference article: https://zhuanlan.zhihu.com/p/514078285

First, take a look at the overall flow chart:

Insert image description here

First, input a W * H image into the network. The network will have two branches:

The first is key point detection, which will generate a heat map and detect points in J label parts;

The second is to obtain the directional relationship of key points, also called PAF. Each key point has the direction of its affinity domain;

Finally, combine the two, connect all key points, and output a complete human skeleton;

Algorithm process

Key points of concern:

1. The heatmap annotation of key points is generated using Gaussian function;

2. PAF: partial affinity domain, an important method proposed in the article;

3. Matching strategy: Hungarian matching;

1. Data production

The data set used is the COCO data set, in which the annotation information of human skeleton points is [x, y, label]. The label values here are 0, 1, and 2, which respectively represent non-existence, occlusion, and normal. Among them, non-existence Key points need to be removed;

Key point Gaussian heat map implementation:

def putGaussianMaps(center, accumulate_confid_map, sigma, grid_y, grid_x, stride):

    start = stride / 2.0 - 0.5
    y_range = [i for i in range(int(grid_y))]
    x_range = [i for i in range(int(grid_x))]
    xx, yy = np.meshgrid(x_range, y_range)      # 构建棋盘
    xx = xx * stride + start                    # 每个点在原始图像上的位置
    yy = yy * stride + start
    d2 = (xx - center[0]) ** 2 + (yy - center[1]) ** 2  # 计算每个点和GT点的距离
    exponent = d2 / 2.0 / sigma / sigma                 # 这里在做一个高斯计算
    mask = exponent <= 4.6052                           # 将在这个阈值范围内的点用True记录
    cofid_map = np.exp(-exponent)                       # 这里做一个标准化
    cofid_map = np.multiply(mask, cofid_map)            # 取出对应关系为True的点
    accumulate_confid_map += cofid_map                  # 将每个点计算的结果都累加到上一次的特征中
    accumulate_confid_map[accumulate_confid_map > 1.0] = 1.0      # 对结果大于1的值，只取1
    
    return accumulate_confid_map    # 返回热力图（heatmap）

Implementation of PAF data calculation:

def putVecMaps(centerA, centerB, accumulate_vec_map, count, grid_y, grid_x, stride):
    centerA = centerA.astype(float)
    centerB = centerB.astype(float)

    thre = 1  # 表示宽度，也就是一个设定好的参数
    centerB = centerB / stride  # 缩放比例特定到特征图中
    centerA = centerA / stride

    limb_vec = centerB - centerA  # 求出两个点的向量
    norm = np.linalg.norm(limb_vec)  # 是需要求单位向量，所以先计算范数，也就是向量模长
    if (norm == 0.0):  # 这里表示两个点基本重合了
        # print 'limb is too short, ignore it...'
        return accumulate_vec_map, count
    limb_vec_unit = limb_vec / norm  # 向量除以模长，得到单位向量
    # print 'limb unit vector: {}'.format(limb_vec_unit)

    # To make sure not beyond the border of this two points
    # 得到所有可能存在方向的区域(这里就用到了之前的超参数阈值)
    min_x = max(int(round(min(centerA[0], centerB[0]) - thre)), 0)
    max_x = min(int(round(max(centerA[0], centerB[0]) + thre)), grid_x)
    min_y = max(int(round(min(centerA[1], centerB[1]) - thre)), 0)
    max_y = min(int(round(max(centerA[1], centerB[1]) + thre)), grid_y)

    # 得到一个可能存在向量的矩形框
    range_x = list(range(int(min_x), int(max_x), 1))
    range_y = list(range(int(min_y), int(max_y), 1))
    xx, yy = np.meshgrid(range_x, range_y)  # 制作一个网格
    ba_x = xx - centerA[0]  # the vector from (x,y) to centerA 根据位置判断是否在该区域上（分别得到X和Y方向的）
    ba_y = yy - centerA[1]
    # 向量叉乘根据阈值选择赋值区域，任何向量与单位向量的叉乘即为四边形的面积
    # 这里是重点步骤，也就是论文中的公式，表示计算出两个向量组成四边形的面积
    limb_width = np.abs(ba_x * limb_vec_unit[1] - ba_y * limb_vec_unit[0])
    mask = limb_width < thre  # mask is 2D （小于阈值的表示在该区域上）

    vec_map = np.copy(accumulate_vec_map) * 0.0  # 构建一个全为0的矩阵

    # 这行代码主要作用是将mask扩展一个维度并且赋值给vec_map数组
    vec_map[yy, xx] = np.repeat(mask[:, :, np.newaxis], 2, axis=2)
    # 在该区域上的都用对应的方向向量表示（根据mask结果表示是否在，通过乘法的方式）
    vec_map[yy, xx] *= limb_vec_unit[np.newaxis, np.newaxis, :]

    # #在特征图中（46*46）中 哪些区域是该躯干所在区域，判断x或者y向量都不为0
    mask = np.logical_or.reduce(
        (np.abs(vec_map[:, :, 0]) > 0, np.abs(vec_map[:, :, 1]) > 0))

    # 每次返回的accumulate_vec_map都是平均值，现在还原成实际值
    accumulate_vec_map = np.multiply(
        accumulate_vec_map, count[:, :, np.newaxis])
    accumulate_vec_map += vec_map  # 加上当前关键点位置形成的向量
    count[mask == True] += 1  # 该区域计算次数都+1

    mask = count == 0

    count[mask == True] = 1  # 没有被计算过的地方就等于自身（因为一会要除法）

    accumulate_vec_map = np.divide(accumulate_vec_map, count[:, :, np.newaxis])  # 算平均向量
    count[mask == True] = 0  # 还原回去

    return accumulate_vec_map, count

These two functions are the two most important parts, which are to process the training data and generate the required training data. The code for traversing the original data is not shown. For details, you can look at the data.py file in the source code;

2. Model construction

Let’s take a look at the overall network architecture:

Insert image description here

There are many choices for the backbone of the model. Here we choose VGG19 to get a 46x46 size feature map before Stage1;

Next, several stages are used to continuously expand the receptive field and improve the effect of the model. Each time the above structure outputs the heat map of the key points (the dimension is 19), the following result outputs the PAF direction information of the key points (the dimension is 38), and the calculation The loss is saved, and then the features of the two dimensions are spliced and then passed to the next stage to improve the accuracy of the entire algorithm through continuous stacking;

3. Predictive reasoning

Here in the prediction model, in the official source code, the processing of PAF is encapsulated in a cpp library, and the static library is obtained through compilation, which can be called in the code;

The processing of PAF here mainly adopts the integral calculation method, that is, for the direction selection of the connecting points, the integral method is used to calculate the optimal connection strategy;

Run picture_demo.py to get the final result, as shown below:

Model results and inference time:

It can be seen that the structure of the model is Vgg19, which is replaceable, and the output dimensions of the feature map are 38 and 19, which is also in line with the defined bone point classification and connection methods;

Summarize

Some details such as the Hungarian matching algorithm, PAF weight calculation, model structure improvement, etc. are not introduced too much here;

And for the current use of OPenpoe, it is mainly compiled into a library and used as a call to the Python program;

And compared with this code, its real-time performance is very good and meets the needs of most implementation scenarios;

If anyone wants the source code of this project, you can leave your email in the comment area. Everyone is welcome to discuss together. More latest technologies will be discussed in the future;