[Tianchi competition clothing key point detection fashionAI_landmark_detect stepping on the pit notes (1)]

The original idea: using the existing network with good classification effect on the ImageNet data set as the basis, extract the features of the clothing image data set, and then perform regression to obtain the key points that need to be located.
Note: Since the features extracted by convNet are more high-level features and have a wider field of view, relatively shallow features such as edge information will be ignored, so the NE effect is 28% when using VGG16 for regression, which is relatively poor.
Improvement points:
(1) The accuracy of key point detection can be seen by reducing the number of layers of the network

(2) The weights learned by each layer be visualized through a heat map, and by comparing the results of different accuracy rates, it is possible to observe which features of the image are more concerned by the detection key points.

(3) During training, the pre-training weight parameters can be frozen from the back to the front, or not frozen, to compare the experimental results; the image can be randomly rotated and the data set can be cropped (de-mean and normalization preprocessing can be considered. effect on the experiment)

(4) How to solve the corresponding transformation of label key points when scaling the image?
Change the scale of the key points, and convert the coordinates to the ratio value relative to the origin. For example, if the image size is 200*200, and the key point label is (100, 100), it can be converted to (0.5, 0.5), so that the image is scaled When it comes to rotation, the corresponding key points do not need to be transformed. When the result is predicted, it can be converted back. I think this is a big detour that I have taken, and it is very important! Because the rotation and cropping of the dataset are of great help in improving the training results.

(5) The idea of ​​adding heatmap: The key points of the label can be mapped into a two-dimensional surface map that obeys the Gaussian distribution. This reduces errors. Because the regression from the network is not a point, but a two-dimensional surface map, so as to compare the error with the two-dimensional surface map generated by the label, design a lossFunction, and perform learning iteration to reduce the error. In this way, the error of a point is mapped to the error of a plane for learning, which will obviously improve the accuracy. (Further design and research are required for the loss Function)
(6) As mentioned earlier, the features learned by the convolutional neural network have no hierarchical information, and do not pay attention to the correlation structure of the image, while the correlation between the key points of clothing and the relative geographic location is relatively strong, so the recent machine learning field can be used instead. The capsule network (capsuleNet) proposed by the Great God is used for feature extraction, thereby improving the accuracy of key point detection.
Note when programming: When defining an existing network, the input shape needs to be defined, otherwise an error will be reported. . .

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326685023&siteId=291194637