Dlib library landmark algorithm analysis (ERT integrated regression tree)

 landmark is a technique for extracting feature points of human faces. The Dlib library is a 68-point marker for human faces. There is a schematic diagram of the effect and calibration point number in the article "Calling Dlib Library for Marking Key Points of Human Faces". In the future, the points in the landmark can be used to extract the eye area and mouth area for fatigue detection, and the nose and other parts can be used for 3D pose estimation.

      The Dlib library uses the algorithm mentioned in "One Millisecond Face Alignment with an Ensemble of Regression Trees" CVPR2014: ERT (ensemble of regression trees) cascade regression, which is a regression tree method based on gradient improvement learning. The algorithm uses cascading regression factors. First, a series of calibrated face images are used as the training set, and then a model is generated.

      the shape_predictor_trainer object to train a shape_predictor using a set of training images, each annotated with shapes you want to predict. To do this, the shape_predictor_trainer uses the state-of-the-art method.

      Use the correlation method based on feature selection to project the target output ri into a random direction w, and select a pair of features (u, v) so that Ii (u ')-Ii (v') is in the projected target wTri The training data has the highest sample correlation.

      After obtaining a picture, the algorithm will generate an initial shape by first estimating a rough feature point position, and then using the gradient boosting algorithm to reduce the sum of the square error of the initial shape and ground truth. The least square method is used to minimize the error, and the cascade regression factor of each level is obtained. The core formula is shown below:

 


The core formula of the algorithm
      we use the gradient to improve the learning of the regression tree to train each rt, and use the least squares method to minimize the error. t represents the serial number of the cascade, and rt (∙, ∙) represents the regressor of the current level. The input parameters of the regressor are the updated shape of the image I and the previous level of the regressor, and the adopted features can be gray values ​​or other. Each regressor is composed of many trees, and the parameters of each tree are obtained by training based on the coordinate difference of current shape and ground truth and randomly selected pixel pairs.

        Unlike LBF, ERT directly stores the updated value ΔS of the shape into the leaf node during the learning of the tree. The initial position S after all the learned trees are added to the meanshape plus all the passed leaf nodes ΔS, you can get the final key position of the face. The overall process is shown below:

 


Regression process, minimize error
attached: Comparison of facial feature extraction algorithms (ASM, CLM, ERT, etc.) Please refer to the following blog:

http://blog.csdn.net/u013803245/article/details/51263808
——————————————————
Copyright Notice: This article is the original article of CSDN blogger "zzyy0929", follow CC 4.0 BY-SA copyright agreement, please attach the original source link and this statement.
Original link: https://blog.csdn.net/zzyy0929/article/details/78323256

Guess you like

Origin www.cnblogs.com/Ph-one/p/12752158.html