基于OpenCV使用OpenPose进行多个人体姿态估计

1、网络的体系结构

2、下载模型的权重文件

3. 第一步：生成图片对应的输出

6. 第四步：组合所有属于同一个人的关键点绘出骨骼图

7. 结果

之前我们使用OpenPose模型对单个人体进行姿态估计。本文讨论了如何同时对多人体进行姿态估计。

假如图片中具有多个人体，姿态估计会生成多个独立的关键点。我们需要对关键点分类，找出属于同一个人的关键点。

我们将会采用COCO数据集中训练好的18点模型。COCO数据集内定义好的关键点和序号如下：

COCO输出格式：
鼻子-0, 脖子-1，右肩-2，右肘-3，右手腕-4，左肩-5，左肘-6，左手腕-7，右臀-8，右膝盖-9，右脚踝-10，左臀-11，左膝盖-12，左脚踝-13，右眼-14，左眼-15，有耳朵-16，左耳朵-17，背景-18.

1、网络的体系结构

OpenPose的体系结构如下。

图1 多人体姿态估计的体系结构

模型输入是h乘以w的彩图。而输出是一组矩阵，同时包含了每个关键点对的关键点和部分亲和力模型（Part Affinity Heatmaps）的置信图[这半句话用百度翻译]。上述的网络结构，包括了如下两个级别。

级别0：VGGNet的头10层用于生成输入图像的特征映射。
级别1：一个2分支多层次的CNN网络。

第一分支估计得到了身体部位地点（如肘关节、膝盖等）的二维置信图。置信图的作用是以灰白程度展示人体部位出现的可能性。例如，左肩的置信图请见下方图2。图中左肩出现的地方以高数值的形式显示。对于18点模型，输出的前19个矩阵即是19张置信图。

图2 对于给定图片的左肩置信图

第二分支预测人体部位亲和度（PAF）的一组2D向量空间，可通过解码计算得到人体部位（关键点）之间的关联度。第20个到第57个矩阵是PAF矩阵。下图，脖子和左肩的人体部位关联度如下。可见同一个人的不同人体部位之间具有较大的关联度。

图3. 对于给定图片的脖子-左肩组合的部位关联度映射。

置信图用于查找关键点，而关联映射图用于获得关键点之间的有效连接。

请随着本教程，用下面链接下载源代码。

我想感谢我的组员Chandrashekara Keralapura ，他/她写了源代码的C++版本。

2、下载模型的权重文件

使用代码包内提供的getModels.sh下载模型的权重文件。请注意配置proto文件已经存在于文件夹中。

在命令行中，cd到下载到的文件夹内，执行以下代码：

sudo chmod a+x getModels.sh
./getModels.sh

检查文件夹中，是否下载好了二进制模型文件（.caffemodel后缀的文件）。如果无法运行上述脚本，可以直接点这里http://posefs1.perception.cs.cmu.edu/OpenPose/models/pose/coco/pose_iter_440000.caffemodel下载模型。下载完成后，要放到“pose/coco/”文件夹内。

3. 第一步：生成图片对应的输出

3.1 读取神经网络

Python:

protoFile = "pose/coco/pose_deploy_linevec.prototxt"
weightsFile = "pose/coco/pose_iter_440000.caffemodel"
 
net = cv2.dnn.readNetFromCaffe(protoFile, weightsFile)

C++

cv::dnn::Net inputNet = cv::dnn::readNetFromCaffe("./pose/coco/pose_deploy_linevec.prototxt","./pose/coco/pose_iter_440000.caffemodel");

3.2 读取图像并生成输入blob

Python

image1 = cv2.imread("group.jpg")
# Fix the input Height and get the width according to the Aspect Ratio
inHeight = 368
inWidth = int((inHeight/frameHeight)*frameWidth)
 
inpBlob = cv2.dnn.blobFromImage(image1, 1.0 / 255, (inWidth, inHeight),
                          (0, 0, 0), swapRB=False, crop=False)

C++

std::string inputFile = "./group.jpg";
 
if(argc > 1){
    inputFile = std::string(argv[1]);
}
 
cv::Mat input = cv::imread(inputFile,CV_LOAD_IMAGE_COLOR);
cv::Mat inputBlob = cv::dnn::blobFromImage(input,1.0/255.0,
                                           cv::Size((int)((368*input.cols)/input.rows),368),
                                           cv::Scalar(0,0,0),false,false);

3.3 向前通过网络

Python:

net.setInput(inpBlob)
output = net.forward()

C++

inputNet.setInput(inputBlob);
cv::Mat netOutputBlob = inputNet.forward();

3.4 样本输出

我们首先把输出的大小调整到与输入一样，然后检查鼻子关键点的置信图。可以使用cv2.addWeighted函数在图像上进行Alpha混合probMap。

i = 0
probMap = output[0, i, :, :]
probMap = cv2.resize(probMap, (frameWidth, frameHeight))
 
plt.imshow(cv2.cvtColor(image1, cv2.COLOR_BGR2RGB))
plt.imshow(probMap, alpha=0.6)

图4 关键点-鼻子的置信图

4. 第二步：关键点检测

上图可知，第0个矩阵对应着鼻子的置信图。同样的，第一个矩阵对应着脖子的，后面置信图矩阵的按固定顺序排列。我们以前的文章已经讨论过，对于单个人体目标的图片，通过查找置信图的最大值即可找到每个关键点的位置。但是在多人体图片中，这方法不可行。因为单个置信图可能同时存在多个关键点。

注意：这部分的解释和代码段是从getKeypoints()函数中扒来的。

对于每个关键点，对置信图应用一个阀值（本例采用0.1）生成二值图。
Python:

mapSmooth = cv2.GaussianBlur(probMap,(3,3),0,0)
mapMask = np.uint8(mapSmooth>threshold)

c++:

cv::Mat smoothProbMap;
cv::GaussianBlur( probMap, smoothProbMap, cv::Size( 3, 3 ), 0, 0 );
 
cv::Mat maskedProbMap;
cv::threshold(smoothProbMap,maskedProbMap,threshold,255,cv::THRESH_BINARY);

上述代码生成了一个矩阵。矩阵包含有对应着当前关键点的多个blob，请见下图。

图5. 引用了阀值函数后的置信图

为了找到关键点的确切位置，我们需要找到每个blob的极大值。通过以下步骤实现：

首先找出每个关键点区域的全部轮廓
生成这个区域的遮盖（mask）
通过用probMap乘以这个遮盖，提取该区域的probMap
找到这个区域的本地极大值。要对每个轮廓（即关键点区域）进行处理。

Python:

#find the blobs
_, contours, _ = cv2.findContours(mapMask, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
 
#for each blob find the maxima
for cnt in contours:
    blobMask = np.zeros(mapMask.shape)
    blobMask = cv2.fillConvexPoly(blobMask, cnt, 1)
    maskedProbMap = mapSmooth * blobMask
    _, maxVal, _, maxLoc = cv2.minMaxLoc(maskedProbMap)
    keypoints.append(maxLoc + (probMap[maxLoc[1], maxLoc[0]],))

C++

std::vector<std::vector<cv::Point> > contours;
cv::findContours(maskedProbMap,contours,cv::RETR_TREE,cv::CHAIN_APPROX_SIMPLE);
 
for(int i = 0; i < contours.size();++i){
    cv::Mat blobMask = cv::Mat::zeros(smoothProbMap.rows,smoothProbMap.cols,smoothProbMap.type());
 
    cv::fillConvexPoly(blobMask,contours[i],cv::Scalar(1));
 
    double maxVal;
    cv::Point maxLoc;
 
    cv::minMaxLoc(smoothProbMap.mul(blobMask),0,&maxVal,0,&maxLoc);
 
    keyPoints.push_back(KeyPoint(maxLoc, probMap.at<float>(maxLoc.y,maxLoc.x)));

将x,y坐标和每个关键点的可能性分数都保存下来。对每个找到的关键点，都分配不同的ID。这对之后进行关键点组的连接时有用。

对输入图像进行关键点检测，结果见下图。可以看到即使图中存在着正面部分可见的人和没有面对着摄像头的人，效果都不错。

图6. 在输入图片中显示检测到的关键点。

也可以不在输入图片中显示关键点。

图7. 在黑背景中显示检测到的关键点

图6中，可看全部关键点都找到了。但是，当关键点不在原图上显示（图7），我们难以辨别关键点是属于哪个人体的。我们必须鲁棒性地实现关键点到人体的映射。这部分是很重要的同时实现起来容易出错。为了实现映射，我们找到关键点的有效连接，然后把这些连接组合起来创建每个人体的骨骼。

5. 第三步：找到有效的连接对

一个有效的连接对就是两个关键点都在同一个人的人体部位上。一个最简单的算出全部有效对方法就是找出一个关节到其他全部关节的最小距离。举个例子，下图中，我们可以找到标记好的鼻子到其他所有脖子的距离。最短距离那一对就是属于同一个人的一对。

图8. 使用测量距离得到关键点的连接

这个方法或者不是用于全部的连接对，特别是当图中有很多人或者人与人之间有重叠。例如，在图中左数起第二个人的左肘和左手腕的配对中，当这个人的左手肘和他本人的左手腕和第三个人的左手腕的距离相同，则无法用这个方法得到有效连接对。

图9. 只使用关键点之间的最小距离可能出现连接失败

这部分就是部分亲和图开始发挥作用的地方。算法给出了两个节点对之间的亲和性的方向。因此，一对关键点不仅具有最小距离，而且两者的方向应该也符合PAF热图的方向。
（译者注：这段话不容易翻译，故给出了原文。This is where the Part Affinity Maps come into play. They give the direction along with the affinity between two joint pairs. So, the pair should not only have minimum distance, but their direction should also comply with the PAF Heatmaps direction.）

下图是左手肘和左手腕连接的热图（Heatmap）。

图10. 左手肘-左手腕连接对的部分亲和热图
因此，这个场合中，即使通过距离检测法错误的识别连接对，由于PAF只同意连接第二个人的手肘和手腕的单位向量，所以OpenPose会输出正确结果。

本方法在本文是这样实现的：

分割一条由两个关键点组合成的线。找到这条线上的“n”个点。
检查PAF在这些点上的方向是否和连接这些点的线具有相同的方向。
如果这个方向符合到了一定程度，则是有效的对。

在代码上看看这是如何实现的。以下代码片段属于所提供的代码里的getValidPairs()函数。

对于每个身体部位连接对，我们做了以下几点：
1. 把连接对上的关键点提取出来。相同的关键点放一起，把关键点分开地方到两个列表上（列表名为candA和candB）。在列表candA上的每一个点都会和列表candB上某些点连接。下图展示了连接对脖子-右肩的candA和candB。

图11. 匹配脖子->鼻子的候选
Python

pafA = output[0, mapIdx[k][0], :, :]
pafB = output[0, mapIdx[k][1], :, :]
pafA = cv2.resize(pafA, (frameWidth, frameHeight))
pafB = cv2.resize(pafB, (frameWidth, frameHeight))

# Find the keypoints for the first and second limb
candA = detected_keypoints[POSE_PAIRS[k][0]]
candB = detected_keypoints[POSE_PAIRS[k][1]]

c++

//A->B constitute a limb
cv::Mat pafA = netOutputParts[mapIdx[k].first];
cv::Mat pafB = netOutputParts[mapIdx[k].second];

//Find the keypoints for the first and second limb
const std::vector<KeyPoint>& candA = detectedKeypoints[posePairs[k].first];
const std::vector<KeyPoint>& candB = detectedKeypoints[posePairs[k].second];

2. 得到连接两个候选点的单位向量。得到了所连接这两个点的线的方向。

Python

d_ij = np.subtract(candB[j][:2], candA[i][:2])
norm = np.linalg.norm(d_ij)
if norm:
    d_ij = d_ij / norm

c++

std::pair<float,float> distance(candB[j].point.x - candA[i].point.x,candB[j].point.y - candA[i].point.y);

float norm = std::sqrt(distance.first*distance.first + distance.second*distance.second);

if(!norm){
    continue;
}

distance.first /= norm;
distance.second /= norm;

3. 在连接两点的直线上创建一个10个插值点的数组（这句话也是百度翻译的。真通顺……）。
Python

# Find p(u)
interp_coord = list(zip(np.linspace(candA[i][0], candB[j][0], num=n_interp_samples),
                        np.linspace(candA[i][1], candB[j][1], num=n_interp_samples)))
# Find L(p(u))
paf_interp = []
for k in range(len(interp_coord)):
    paf_interp.append([pafA[int(round(interp_coord[k][1])), int(round(interp_coord[k][0]))],
                       pafB[int(round(interp_coord[k][1])), int(round(interp_coord[k][0]))] ])

C++

//Find p(u)
std::vector<cv::Point> interpCoords;
populateInterpPoints(candA[i].point,candB[j].point,nInterpSamples,interpCoords);
//Find L(p(u))
std::vector<std::pair<float,float>> pafInterp;
for(int l = 0; l < interpCoords.size();++l){
    pafInterp.push_back(
        std::pair<float,float>(
            pafA.at<float>(interpCoords[l].y,interpCoords[l].x),
            pafB.at<float>(interpCoords[l].y,interpCoords[l].x)
        ));
}

4. 对这些点的PAF和单位向量d_ij用点乘运算。
Python

# Find E
paf_scores = np.dot(paf_interp, d_ij)
avg_paf_score = sum(paf_scores)/len(paf_scores)

c++

std::vector<float> pafScores;
float sumOfPafScores = 0;
int numOverTh = 0;
for(int l = 0; l< pafInterp.size();++l){
    float score = pafInterp[l].first*distance.first + pafInterp[l].second*distance.second;
    sumOfPafScores += score;
    if(score > pafScoreTh){
        ++numOverTh;
    }
     
    pafScores.push_back(score);
}
 
float avgPafScore = sumOfPafScores/((float)pafInterp.size());

5. 如果这些点中有70%满足标准，则把这一对当成有效。
Python

# Check if the connection is valid
# If the fraction of interpolated vectors aligned with PAF is higher then threshold -> Valid Pair  
if ( len(np.where(paf_scores > paf_score_th)[0]) / n_interp_samples ) > conf_th :
    if avg_paf_score > maxScore:
        max_j = j
        maxScore = avg_paf_score

c++

if(((float)numOverTh)/((float)nInterpSamples) > confTh){
    if(avgPafScore > maxScore){
        maxJ = j;
        maxScore = avgPafScore;
        found = true;
    }
}

6. 第四步：组合所有属于同一个人的关键点绘出骨骼图

既然我们已经把全部的关键点组合成对了，我们可以把具有相同部位检测候选的连接对组合成复数个人体的姿态。（原文是Now that we have joined all the keypoints into pairs, we can assemble the pairs that share the same part detection candidates into full-body poses of multiple people.）

我们来看看这是如何实现的。以下代码段来自附带代码中的getPersonwiseKeypoints()函数。

1. 我们首先创建空列表，用来存放每个人的关键点（即关键部位）。然后我们遍历每一个连接对，检查连接对中的partA是否已经存在于任意列表之中。如果存在，那么意味着这关键点属于当前列表，同时连接对中的partB也同样属于这个人体。因此，把连接对中的partB增加到partA所在的列表。
Python

for j in range(len(personwiseKeypoints)):
    if personwiseKeypoints[j][indexA] == partAs[i]:
        person_idx = j
        found = 1
        break
 
if found:
    personwiseKeypoints[person_idx][indexB] = partBs[i]

C++

for(int j = 0; !found && j < personwiseKeypoints.size();++j){
    if(indexA < personwiseKeypoints[j].size() &&
       personwiseKeypoints[j][indexA] == localValidPairs[i].aId){
        personIdx = j;
        found = true;
    }
}/* j */
 
if(found){
    personwiseKeypoints[personIdx].at(indexB) = localValidPairs[i].bId;
}

2. 如果partA不存在于任意列表，那么说明这一对属于一个还没建立列表的人体，于是需要新建一个新列表。

Python

# if find no partA in the subset, create a new subset
elif not found and k < 17:
    row = -1 * np.ones(19)
    row[indexA] = partAs[i]
    row[indexB] = partBs[i]

c++

else if(k < 17){
    std::vector<int> lpkp(std::vector<int>(18,-1));
 
    lpkp.at(indexA) = localValidPairs[i].aId;
    lpkp.at(indexB) = localValidPairs[i].bId;
     
    personwiseKeypoints.push_back(lpkp);
}

7. 结果

我们遍历每个人并在输入原图上绘制骨架。
Python：

for i in range(17):
    for n in range(len(personwiseKeypoints)):
        index = personwiseKeypoints[n][np.array(POSE_PAIRS[i])]
        if -1 in index:
            continue
        B = np.int32(keypoints_list[index.astype(int), 0])
        A = np.int32(keypoints_list[index.astype(int), 1])
        cv2.line(frameClone, (B[0], A[0]), (B[1], A[1]), colors[i], 2, cv2.LINE_AA)
 
 
cv2.imshow("Detected Pose" , frameClone)
cv2.waitKey(0)

c++

for(int i = 0; i< nPoints-1;++i){
    for(int n  = 0; n < personwiseKeypoints.size();++n){
        const std::pair<int,int>& posePair = posePairs[i];
        int indexA = personwiseKeypoints[n][posePair.first];
        int indexB = personwiseKeypoints[n][posePair.second];
 
        if(indexA == -1 || indexB == -1){
            continue;
        }
 
        const KeyPoint& kpA = keyPointsList[indexA];
        const KeyPoint& kpB = keyPointsList[indexB];
 
        cv::line(outputFrame,kpA.point,kpB.point,colors[i],2,cv::LINE_AA);
         
    }
}
 
cv::imshow("Detected Pose",outputFrame);
cv::waitKey(0);

下图展示了每一个检测到的人的骨架。

请自行运行代码验证一下。

以上就是原文翻译的内容。本文翻译自https://www.learnopencv.com/multi-person-pose-estimation-in-opencv-using-openpose/
原标题是Multi-Person Pose Estimation in OpenCV using OpenPose

至于源代码，请自行上网站https://www.learnopencv.com/multi-person-pose-estimation-in-opencv-using-openpose/
获取。平时有逛learnopencv的读者都知道，代码都托管在github上面的。所以获取代码并不困难。我也不在这里直接贴出链接了。

另外搜索Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields，可获得与本文相关的知识。