Face value scoring using open source facial feature extractor

This article is only for actual model application, not for face value research. The results are for entertainment and reference only.

The method is also for reference only.

In general, the larger the amount of data, the closer the results are to normal human aesthetics. Due to the small amount of data this time, it is only an experiment.

Use environment: ubuntu14.04, opencv3.2.0, dlib19.6, python2.7

1. Preparation:

1. Download the dlib library and download the feature extraction model.

The role of the model is to generate a 128-dimensional feature vector through a convolutional neural network to represent the face. The network input parameters are the 68 feature point shapes of the face landmark and the entire image. It can be guessed that the network features are related to the coordinates of the 68 feature points of the face, which are normalized and further processed in the network, so that the proposed features are independent and unique.

Considering that the value of the face is related to the position of the facial features and the expression when taking pictures, this network can be tried as a solution.

Dlib download:

http://dlib.net/

This model was originally used for face recognition, and the prototype is CNN_ResNet. The residual network is to reduce the problem of gradient dispersion/explosion caused by the increase of the number of network layers during the training process. The method achieves an accuracy of 99.38% for face recognition on LFW.

Model name: dlib_face_recognition_resnet_model_v1, the number of iterations is 10,000, and about 3 million images are used for training. The image size of the input layer is 150.

download link:

Network model address for extracting features:

http://dlib.net/files/dlib_face_recognition_resnet_model_v1.dat.bz2

Landmark 68 feature point location extraction model:

http://dlib.net/files/shape_predictor_5_face_landmarks.dat.bz2

2. Data preparation: prepare different types of facial images, pay attention to the selection of photos with different appearances, this part has a certain degree of subjectivity, and is also the most important part of the final score, so the amount of data should be as large as possible. Images are as typical as possible.

We set 6 scores, respectively: 95, 90, 85, 80, 70, 65

Only 2 people scored 95 points, and the remaining scores were around 15 people. 85 people at most, about 20 people. The data fit a normal distribution.

Second, generate the database.

The sorted pictures are contained in folders, and each folder is a type of face value score. In the case of ensuring that the face can be detected, each image is sent to the network to extract features, and a label is added to it to indicate the category of the face value, so as to prepare for the subsequent test classification.

In this way, each image has already generated its corresponding 128 values ​​and a label.

3. Score estimation based on nearest neighbor matching (similar to KNN)

The data format is shown in the following table:

 
Database data storage format


The new test image enters the network and also gets 128 values:

 
Test image data format


Define two scales (way) to measure proximity:
(1) Euclidean distance:

 
Euclidean distance

 

(2) Proximity representation based on linear combination coefficients:
We transpose the data matrix in Table 1 to obtain the matrix shown in the following table:

 

 
sample data transpose matrix

 

Set the above matrix as A, and the feature column vector formed by the test image as b.

 

 
Matrix Equation Solving


A is 128*n dimensions, x is n dimensions, and b is 128 dimensions.
Then the obtained x is the component of each column vector of the b vector in the A vector. That is, the feature of the test image is regarded as a linear combination of the features of each image in the original data set. The larger the coefficient is, the closer the picture in the database corresponding to the coefficient is to the test picture.
Find the three closest Euclidean distances and the three with the largest coefficients in the linear combination, and weight these three separately.

For the three closest Euclidean distances, we find the corresponding original data (score values), we tentatively think that the probabilities of the three are similar, and weight the sum in the form of 1:1:1 (there may be many of the three that belong to the same face value category).

For the linear combination method, get these three corresponding and then use the weight method.

Finally, combining the two methods, we believe that the second scheme is more credible and weighted with a weight of 0.6, and the first scheme is weighted with a weight of 0.4.
Take the first 5 sheets of Euclidean distance, and conduct category voting to test the score. If the score value corresponding to the category of the voting result is significantly different from the previous score, the voting result will be converted into the total score at a certain proportion, and the original score will be adjusted. score, in case the error is too large.

 
Test image 1
 
Test picture 2

 

 
Test image 3

 

4. Expansion: Adding gender recognition
means preparing about 100 male and female photos each, which should cover different age groups. In the case of a small number of samples, (pretty) boys may be mistaken for girls.
Labeled respectively: 0-girls, 1-boys.
Classification based on voting, find the Euclidean distance and cosine distance of the distance between the test image and the feature value in the database, take the 10 images with the closest feature distance, find the gender of the corresponding original image, and vote, if more than half (that is, more than 10 images) ) considers it to be that gender.
The data results are as follows:
The gender results of the last 10 pictures at the Euclidean distance: [1,1,0,0,1,1,1,1,1,1] The gender results
of the last 10 pictures at the cosine distance: [1,1, 1,0,0,1,1,1,1,1]
The result is: male, confidence level confidence=8*2/20=0.8
confidence level indicates the reliability of this result, or according to prior knowledge, find Predict the probability of a class.
Voting-based schemes have higher accuracy.

 

 
gender test results


[Note] Both test and training images originate from the network.

SVM-based classification can also be used, the key code:
clf=svm.SVC(C=1, kernel='rbf', gamma=1, decision_function_shape='ovr')
clf.fit(dataMat,np.uint8(labelMat))
face_descriptor_trans=face_descriptor_trans.reshape(1,-1)
print(clf.decision_function(dataMat))
score=clf.predict(face_descriptor_trans)

However, in the calculation of the problem value, the classification result is always the third category, and the reason is unknown. And the number of pictures in the third category is slightly more than other categories.
The binary classification problem has a good performance.

In addition, proximity matching and classification ideas can also be used in classification problems such as expression recognition.

---------------------------------Key code--------------- --------------------
Euclidean distance and cosine distance calculation
def euler_dist(vector1, vector2):
X = np.vstack([vector1, vector2])
dist = pdist (X)
return dist

def cos(vector1, vector2):
dot_product = 0.0;
normA = 0.0;
normB = 0.0;
for a, b in zip(vector1, vector2):
dot_product += a * b
normA += a ** 2
normB += b ** 2
if normA == 0.0 or normB == 0.0:
return None
else:
return dot_product / ((normA * normB) ** 0.5)

The matrix is ​​converted into a list, which can be indexed by index: dist1 = list(dist)
sorts the original dist to find the nearest index number new_dist1 = sorted(dist)
score_1[j]=labelMat[np.uint8(loca_dist1[j]) ]
to find out which category the corresponding label is.

for i in range(0,num_select):
record_times[np.uint8(score_1[i])]=record_times[np.uint8(score_1[i])]+1
to vote.
Score weighting:
final_score=score[np.uint8(score_1[0])] 0.333+score[np.uint8(score_1[1])] 0.333+score[np.uint8(score_1[2])]*0.333

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325151262&siteId=291194637