目标检测之LatentSVM和可变形部件模型（DPM:Deformable Part Model）

文章转自：目标检测之LatentSVM和可变形部件模型（DPM:Deformable Part Model）

一、综述

Deformable Part Model和LatentSVM结合用于目标检测由大牛P.Felzenszwalb提出，代表作是以下3篇paper：

[1] P. Felzenszwalb, D. McAllester, D.Ramaman. A Discriminatively Trained, Multiscale, Deformable Part Model. Proceedingsof the IEEE CVPR 2008.pdf 中文译文

[2] P. Felzenszwalb, R. Girshick, D.McAllester, D. Ramanan. Object Detection with Discriminatively Trained PartBased Models. IEEE Transactions on Pattern Analysis and Machine Intelligence,Vol. 32, No. 9, September 2010.pdf 中文译文

[3] P. Felzenszwalb, R. Girshick, D.McAllester. Cascade Object Detection with Deformable Part Models. Proceedingsof the IEEE CVPR 2010. pdf

其中[2]阐述了如何利用DPM来做检测（特征处理+分类阶段），[3]阐述了如何利用cascade思想来加速检测。综合来说，作者的思想是Hog Features+Deformable Part Model+Latent SVM的结合：

1、通过Hog特征模板来刻画每一部分，然后进行匹配。并且采用了金字塔，即在不同的分辨率上提取Hog特征。

2、利用提出的Deformable PartModel，在进行object detection时，detect window的得分等于part的匹配得分减去模型变化的花费。

3、在训练模型时，需要训练得到每一个part的Hog模板，以及衡量part位置分布cost的参数。文章中提出了LatentSVM方法，将deformable part model的学习问题转换为一个分类问题：利用SVM学习，将part的位置分布作为latent values，模型的参数转化为SVM的分割超平面。具体实现中，作者采用了迭代计算的方法，不断地更新模型。

具体内容参考以下文章：

1、目标检测（Object Detection）原理与实现(六) - 基于形变部件模型（DeformablePart Models）的目标检测

2、有关可变形部件模型(Deformable Part Model)的一些说明 - Why So Serious? - 博客频道 - CSDN.NET

二、实现代码

1、作者提供了旧版本代码：voc-release4.01，以及基于cascade思想的优化版本：star-cascade，在几乎不损失精度的情况下效率大大提高（14倍），在损失一点recall的情况下效率提升40倍。

2、新版本的代码：voc-release5，包含了star-cascade，但是对于硬件要求较高。

3、以上代码均为linux或者Mac OS X版本，DanielRodríguez Molina依此实现了C++版本的代码：LibPaBOD:a LIBrary for PArt-Based Object Detection in C++。

三、训练模型

首先latentSVM在进行目标检测时，是把分类问题转化为了检测问题，所以需要训练样本得到应用于检测的模型（model）：如果是matlab代码，则是.mat文件；如果是c++代码，则是.xml文件（据我所知，P. Felzenszwalb的方法目前只能通过matlab代码进行模型的训练）。下面分别介绍这两种模型文件：

1、以C/C++中的*.xml为例

[plain]view plaincopy 
    
 <Model>  
   
          <!-- Number of components -->  
   
          <NumComponents>2</NumComponents>  
   
          <!-- Number of features -->  
   
          <P>31</P>  
   
          <!-- Score threshold -->  
   
          <ScoreThreshold>-1.0028649999999999</ScoreThreshold>  
   
          <Component>  
   
                    <!-- Root filterdescription -->  
   
                    <RootFilter>  
   
                             <!-- Dimensions-->  
   
                             <sizeX>5</sizeX>  
   
                             <sizeY>9</sizeY>  
   
                             <!-- Weights(binary representation) -->  
   
                             <Weights>  
   
                                      ...  
   
                             </Weights>  
   
                             <!-- Linear termin score function -->  
   
                             <LinearTerm>-2.2535784347835031</LinearTerm>  
   
                    </RootFilter>  
   
                    <!-- Part filtersdescription -->  
   
                    <PartFilters>  
   
                             <NumPartFilters>6</NumPartFilters>  
   
                             <PartFilter>...</PartFilter>  
   
                             <PartFilter>...</PartFilter>  
   
                             <PartFilter>...</PartFilter>  
   
                             <PartFilter>...</PartFilter>  
   
                             <PartFilter>...</PartFilter>  
   
                             <PartFilter>...</PartFilter>  
   
                    </PartFilters>  
   
          </Component>  
   
          <Component>  
   
                    <!-- Root filterdescription -->  
   
                    <RootFilter>  
   
                             <!-- Dimensions-->  
   
                             <sizeX>5</sizeX>  
   
                             <sizeY>9</sizeY>  
   
                             <!-- Weights(binary representation) -->  
   
                             <Weights>  
   
                                      ...  
   
                             </Weights>  
   
                             <!-- Linear termin score function -->  
   
                             <LinearTerm>-2.5835343890077622</LinearTerm>  
   
                    </RootFilter>  
   
                                      <!--Part filters description -->  
   
                    <PartFilters>  
   
                             <NumPartFilters>6</NumPartFilters>  
   
                             <!-- Part filter? description -->  
   
                             <PartFilter>...</PartFilter>  
   
                             <PartFilter>...</PartFilter>  
   
                             <PartFilter>...</PartFilter>  
   
                             <PartFilter>...</PartFilter>  
   
                             <PartFilter>...</PartFilter>  
   
                             <PartFilter>...</PartFilter>                                   
   
                    </PartFilters>  
   
          </Component>  
   
 </Model>  
   
 PartFilter内部结构：  
   
 <PartFilter>  
   
          <sizeX>6</sizeX>  
   
          <sizeY>6</sizeY>  
   
          <!-- Weights (binary representation)-->  
   
          <Weights></Weights>  
   
          <!-- Part filter offset -->  
   
          <V>  
   
                    <Vx>3</Vx>  
   
                    <Vy>1</Vy>  
   
          </V>  
   
          <!-- Quadratic penalty functioncoefficients -->  
   
          <Penalty>  
   
                    <dx>0.0004031731821276</dx>  
   
                    <dy>-0.0003745111759062</dy>  
   
                    <dxx>0.0100010270581015</dxx>  
   
                    <dyy>0.0205820897831230</dyy>  
   
          </Penalty>  
   
 </PartFilter>

注：里面的数据仅是为了说明，weight节点数据量太大，省略，PartFilter节点有点复杂，单列出来；其中有些简洁说明性文字，不是很全，PartFilter中Vx，Vy为HOG的位移。Penalty节点下的各项是为了计算deformation model。

【参考CSDN博文：OpenCV中Latent SVM模型文件XML】

2、以matlab中的*.mat为例

模型文件是按照Latent SVM模型保存语法保存参数的，利用了查表的思想，具体细节需要参考原始论文。这样保存的目的好像是计算方便。

（1）model简单节点内容

model.class= 'bottle'

model.year= 2011

model.note= 15-50-36

等

（2）model有以下关键节点

filters:[1x42 struct]

rules: {1x81 cell}

symbols:[1x81 struct]

start: 2

里面的数字仅为了说明使用

model.symbols(i)=

type: 'T'

filter: 3

T:terminal,对应一个filter，并且给出其index。N->non-terminal，非filter，而是一个deformationmodel

model.filters{i}=

w: [7x11x31 double]

blocklabel: 1

symmetric: 'M'

size: [7 11]

flip: 0

symbol: 1

包含the filterweights和 blocklabel，symmetric有两种情况M和N，M表明有一个垂直镜像部分；N表示没有。如果symmetric == 'M'那么将会有两个filters有相同的blocklabel。

flip表明w需要filp组成filter。

symbol记录model.symbol的标记，例如model.symbols{8}.filter= 3,那么model.filter{3}.bymbol = 8;

model.rules是模型存取的关键，model.rules和symbol大小相同，每个model.rules中都有一个symbol。

LHS:lefthand side

RHS:righthand side

Thecellmodel.rules{i} holds an array struct that lists the rules for which I actsasthe left-hand side symbol. So,model.rules{4}(1) and model.rules{4}(2) are the first two productionsforsymbol 4 in some imaginary grammar. Thefield model.start holds the distinguished start symbol for thegrammar. Let's use that symbol as anexample

model.rules{model.start}(1)=

type: 'S'

lhs: 2

rhs: [1 11 15 19 23 27 31]

offset: { w: -3.394 blocklabel: 2}

anchor: {[0 0 0] [12 0 1] [4 51] [13 8 1] [0 0 1] [16 0 1] [0 5 1]}

Hereisthe first rule with model.start on the LHS. In the case of a 6 componentmixture model, model.rules {model.start} isa struct array of 6 elements. The fieldlhs is simply a convenience fieldindicating what the production's LHS is. The field rhs holds the symbols that appearon the production'sright-hand side. Thefield type is 'S' ifthis is a structural rule or 'D' if this is a deformationrule. Now I'll split the description intotwo cases.

case1:structural rule

Thefieldoffset holds the offset value and its blocklabel (these will be shared formirroredcomponents). The anchor fieldholds theparameters of the "structure function" that defines how eachof thesymbols in the rhs is placed relative to a placement of the lhssymbol. The format is [dx dy ds], where2^ds is thescale change. So ds = 1implies that therhs symbol

lives attwice resolution of the lhssymbol. Thevalues of dx, dy are HOG celldisplacements at the rhs's native scale. Note that dx and dy are displacements, sothey are 1 less than the anchorvalues that were defined in the old model. The first symbol on the rhs has anchor = [0 00], because this symbol isa terminal for the root filter.

case2:deformation rule

Let'slook at the second symbol on the rhs inthe rule above.

model.rules{11}=

type: 'D'

lhs: 11

rhs: 10

def: {

w: [0.0209 -0.0015 0.0155 0.0010]

blocklabel: 8

flip: 0

symmetric: 'M'

}

offset: { w: 0 blocklabel: 7 }

Deformationrules don't have an anchor field,but do have a def field. The def fielddescribes the deformation modelfor this rule, and so it includes thecoefficients and the blocklabel. Thedef.symmetric field can be 'M' if thereis a vertically mirrored deformationrule or 'N' if there is no symmetry constraint. In the case of 'M', flipindicates how to write features and read models(just like features.flip). There's animplicit assumption in the codethat deformation rules only have one symbol onthe right-hand side (though itneed not be a terminal).

def.w用于求取deformation model，也就是XML中的Penalty节点内容

【参考CSDN博文：Matlab中Latent SVM model参数含义】

3、将mat文件转化为xml文件

第一步：获取参数

其中为matlab语言

1.components 个数

[plain]view plaincopy 
    
 NumComponents=length(model.rules{model.start})  
 ncom =length(model.rules{model.start});  

2.Numberof features这个固定， = 31，model文件中没有记录

3.Score threshold= model.thresh

第二步：获取每个component的 root filter，part filters，deformationfilter

1.对于每个component首先获取其中rootfilter的索引，由model.start获取root filter的标号，分两种情况：

[plain]view plaincopy 
    
 for icom = 1:ncom %component个数  
    rhs =model.rules{model.start}(icom).rhs;  
    % assume the root filter is firston the rhs of the start rules  
    if model.symbols(rhs(1)).type =='T'  
       % handle case where there'snodeformation model for the root  
       root=model.symbols(rhs(1)).filter;  
    else  
       % handle case where there isadeformation model for the root  
       root =model.symbols(model.rules{rhs(1)}(layer).rhs).filter;  
    end  
 end  

2.在每个component内，获取partfilter的个数，并获取每个part的参数

[plain]view plaincopy 
    
  icom=1:NumComponents  
  npart=length(model.rules{model.start}(icom).rhs) -1 ;  
 %icom取值不同不影响这个结果，应该不同的part值一样  
  foripart= 2: npart+1  
     irule=model.rules{model.start}(icom).rhs(ipart);  
    filternum =model.symbols(model.rules{irule}.rhs).filter;  
     %获取每个part的相关参数[dx,dy,ds]和penalty[dxdy dxx dyy]  
    Vx =model.rules{model.start}(icom).anchor{ipart}(1)+1;  
    Vy =model.rules{model.start}(icom).anchor{ipart}(2)+1;  
    dx = model.rules{irule}.def.w(2);  
    dy = model.rules{irule}.def.w(4);  
    dxx = model.rules{irule}.def.w(1);  
    dyy = model.rules{irule}.def.w(3);  
 end  

【参考CSDN博文：模型转化Mat-〉XML】

事实上有个工具可以用于该转化：http://www.csie.ntu.edu.tw/~r99922070/code/。对于“voc-release”中给出的训练好的各类目标模型（.mat文件），我们都可以转化为.xml文件并用于C/C++（opencv）中。

综上所述，要得到训练模型，两种办法：一种是linux下利用作者的matlab源代码训练（源代码的README中有说明）；另一种是修改作者的matlab代码，使之能够在windows下训练，然后将.mat模型文件转化为.xml文件（@loadstar_kun）。

四、opencv中的应用

OpenCV 2.2以上版本就实现了C/C++版本的LatentSVM，查看opencv的change log如下：

objdetect：LatentSVM object detector, implementing P. Felzenszwalb algorithm, has been contributed by Nizhniy Novgorod State University (NNSU) team.

查看opencv的doc中对于LatentSVM的说明如下：

Latent SVM - Discriminatively TrainedPart Based Models for Object Detection

The object detector described below hasbeen initially proposed by P.F. Felzenszwalb in [Felzenszwalb2010].It is based on a Dalal-Triggs detector that uses a single filter on histogramof oriented gradients (HOG) features to represent an object category. Thisdetector uses a sliding window approach, where a filter is applied at allpositions and scales of an image. The first innovation is enriching theDalal-Triggs model using a star-structured part-based model defined by a “root”filter (analogous to the Dalal-Triggs filter) plus a set of parts filters andassociated deformation models. The score of one of star models at a particularposition and scale within an image is the score of the root filter at the givenlocation plus the sum over parts of the maximum, over placements of that part,of the part filter score on its location minus a deformation cost easuring thedeviation of the part from its ideal location relative to the root. Both rootand part filter scores are defined by the dot product between a filter (a setof weights) and a subwindow of a feature pyramid computed from the input image.Another improvement is a representation of the class of models by a mixture ofstar models. The score of a mixture model at a particular position and scale isthe maximum over components, of the score of that component model at the givenlocation.

In OpenCV there are C implementation of Latent SVM andC++ wrapper of it. C version is the structure CvObjectDetection and a set offunctions working with this structure (see cvLoadLatentSvmDetector(), cvReleaseLatentSvmDetector(),cvLatentSvmDetectObjects()).C++ version is the class LatentSvmDetector and has slightlydifferent functionality in contrast with C version - it supports loading anddetection of several models.

OpenCV提供两种demo，C版本的：samples/c/latentsvmdetect.cpp，C++版本的：samples/cpp/latentsvm_multidetect.cpp，不同之处在于C++版本的可以加载不同的模型来同时检测多种目标。使用时把训练好的模型文件放到一个文件夹中，待检测的图像放到另一个文件夹中，并设置好输入参数，即可检测。这里以opencv2.4.0为例子，说明C++版本的latentSVM的使用（源代码在\OpenCV2.4.0\modules\objdetect中）：

[cpp]view plaincopy 
    
 #include <iostream>  
 #include "opencv2/objdetect/objdetect.hpp"  
 #include "opencv2/highgui/highgui.hpp"  
 #include "opencv2/contrib/contrib.hpp"  
   
 #ifdef WIN32  
 #include <io.h>  
 #else  
 #include <dirent.h>  
 #endif  
   
 #ifdef HAVE_CVCONFIG_H  
 #include <cvconfig.h>  
 #endif  
   
 #ifdef HAVE_TBB  
 #include "tbb/task_scheduler_init.h"  
 #endif  
   
 using namespace std;  
 using namespace cv;  
   
 static void help()  
 {  
     cout << "This program demonstrated the use of the latentSVM detector." << endl <<  
         "It reads in a trained object models and then uses them to detect the objects in an images." << endl <<  
         endl <<  
         "Call:" << endl <<  
         "./latentsvm_multidetect <imagesFolder> <modelsFolder> [<overlapThreshold>][<threadsNumber>]" << endl <<  
         "<overlapThreshold> - threshold for the non-maximum suppression algorithm." << endl <<  
         "Example of <modelsFolder> is opencv_extra/testdata/cv/latentsvmdetector/models_VOC2007" << endl <<  
         endl <<  
         "Keys:" << endl <<  
         "'n' - to go to the next image;" << endl <<  
         "'esc' - to quit." << endl <<  
         endl;  
 }  
   
 static void detectAndDrawObjects( Mat& image, LatentSvmDetector& detector, const vector<Scalar>& colors, float overlapThreshold, int numThreads )  
 {  
     vector<LatentSvmDetector::ObjectDetection> detections;  
   
     TickMeter tm;  
     tm.start();  
     detector.detect( image, detections, overlapThreshold, numThreads);  
     tm.stop();  
   
     cout << "Detection time = " << tm.getTimeSec() << " sec" << endl;  
   
     const vector<string> classNames = detector.getClassNames();  
     CV_Assert( colors.size() == classNames.size() );  
   
     for( size_t i = 0; i < detections.size(); i++ )  
     {  
         const LatentSvmDetector::ObjectDetection& od = detections[i];  
         rectangle( image, od.rect, colors[od.classID], 3 );  
     }  
     // put text over the all rectangles  
     for( size_t i = 0; i < detections.size(); i++ )  
     {  
         const LatentSvmDetector::ObjectDetection& od = detections[i];  
         putText( image, classNames[od.classID], Point(od.rect.x+4,od.rect.y+13), FONT_HERSHEY_SIMPLEX, 0.55, colors[od.classID], 2 );  
     }  
 }  
   
 static void readDirectory( const string& directoryName, vector<string>& filenames, bool addDirectoryName=true )  
 {  
     filenames.clear();  
   
 #ifdef WIN32  
     struct _finddata_t s_file;  
     string str = directoryName + "\\*.*";  
   
     intptr_t h_file = _findfirst( str.c_str(), &s_file );  
     if( h_file != static_cast<intptr_t>(-1.0) )  
     {  
         do  
         {  
             if( addDirectoryName )  
                 filenames.push_back(directoryName + "\\" + s_file.name);  
             else  
                 filenames.push_back((string)s_file.name);  
         }  
         while( _findnext( h_file, &s_file ) == 0 );  
     }  
     _findclose( h_file );  
 #else  
     DIR* dir = opendir( directoryName.c_str() );  
     if( dir != NULL )  
     {  
         struct dirent* dent;  
         while( (dent = readdir(dir)) != NULL )  
         {  
             if( addDirectoryName )  
                 filenames.push_back( directoryName + "/" + string(dent->d_name) );  
             else  
                 filenames.push_back( string(dent->d_name) );  
         }  
     }  
 #endif  
   
     sort( filenames.begin(), filenames.end() );  
 }  
   
 int main(int argc, char* argv[])  
 {  
     help();  
   
     string images_folder, models_folder;  
     float overlapThreshold = 0.2f;  
     int numThreads = -1;  
     if( argc > 2 )  
     {  
         images_folder = argv[1];  
         models_folder = argv[2];  
         if( argc > 3 ) overlapThreshold = (float)atof(argv[3]);  
         if( overlapThreshold < 0 || overlapThreshold > 1)  
         {  
             cout << "overlapThreshold must be in interval (0,1)." << endl;  
             exit(-1);  
         }  
   
         if( argc > 4 ) numThreads = atoi(argv[4]);  
     }  
   
     vector<string> images_filenames, models_filenames;  
     readDirectory( images_folder, images_filenames );  
     readDirectory( models_folder, models_filenames );  
   
     LatentSvmDetector detector( models_filenames );  
     if( detector.empty() )  
     {  
         cout << "Models cann't be loaded" << endl;  
         exit(-1);  
     }  
   
     const vector<string>& classNames = detector.getClassNames();  
     cout << "Loaded " << classNames.size() << " models:" << endl;  
     for( size_t i = 0; i < classNames.size(); i++ )  
     {  
         cout << i << ") " << classNames[i] << "; ";  
     }  
     cout << endl;  
   
     cout << "overlapThreshold = " << overlapThreshold << endl;  
   
     vector<Scalar> colors;  
     generateColors( colors, detector.getClassNames().size() );  
   
     for( size_t i = 0; i < images_filenames.size(); i++ )  
     {  
         Mat image = imread( images_filenames[i] );  
         if( image.empty() )  continue;  
   
         cout << "Process image " << images_filenames[i] << endl;  
         detectAndDrawObjects( image, detector, colors, overlapThreshold, numThreads );  
   
         imshow( "result", image );  
   
         for(;;)  
         {  
             int c = waitKey();  
             if( (char)c == 'n')  
                 break;  
             else if( (char)c == '\x1b' )  
                 exit(0);  
         }  
     }  
   
     return 0;  
 }