[Paper sharing] Pedestrian attribute recognition based on attribute correlation

Pedestrian attribute recognition based on attribute correlation

Pedestrian attribute recognition is widely used in pedestrian tracking and pedestrian re-identification.

Two basic challenges:

  1. Multi-label nature
  2. Distinctive characteristics of data samples, such as class imbalance and partial occlusion.

Schematic diagram of different methods:
Insert image description here

In this work, the author proposes a Cross Attribute and Feature Network (CAFN), which makes full use of the correlation between any pair of attributes for pedestrian attribute recognition to address these challenges.

  1. CAFN contains two modules: Cross-attribute Attention Module (C2AM) and Cross-feature Attention Module (CFAM)
  2. C2AM enables the network to automatically learn the relationship matrix during the training process, which can quantify the correlation between any pair of attributes in the attribute set, and introduces CFAM to fuse different attribute features to generate more accurate and robust attribute features.

Method introduction

The overall network architecture can be seen: CAFN includes the CFAM module, and CFAM includes the C2AM module.
Insert image description here
In other words, from the above architecture diagram, you can see the C2AM module (cross-attribute attention module) proposed by the author, especially the self-attention module in the transformer. The author demonstrates through experiments that it can build higher cross-attribute attention.
Then, its CFAM module (Cross-Feature Attention Module) is a bit like the multi-head of multi-head attention. The author uses h to express it in the paper, h=4.

loss function

The author uses a weighted binary cross-entropy loss function:
Insert image description here

Experimental results

In order to verify the effectiveness of the proposed model, the authors conducted experiments on three public datasets PETA, RAP and PA-100K respectively.

Let’s first introduce these three data sets

  • The PETA dataset [25] contains 8705 pedestrians with a total of 19,000 images (resolution range from 17×39 to 169×365). Each pedestrian is labeled with 61 binary attributes and 4 multi-class attributes. However, due to established protocols, some properties will not be used. We only use 35 attributes with a positive label ratio higher than 5%. In addition, the PETA data set is divided using the same method as [18], and the number of images in the training, validation and test sets are 9500, 1900 and 7600 respectively.
  • The RAP dataset [26] is collected from real indoor environments. A total of 26 cameras were used to collect surveillance scene images, with a total of 41585 samples and a resolution ranging from 36×92
    to 344×554. Specifically, there are 33,268 training images and 8317 testing images. Each image sample contains 72 fine-grained attributes (69 binary attributes and 3 multi-class attributes). However, we only exploited 51 attributes with a positive label ratio higher than 1%.
  • The PA-100K dataset [16] consists of 598 images taken by real outdoor surveillance cameras. There are 100,000 samples in total, and the resolution of each sample image is
    between 50×100 and 758×454. The PA-100K dataset is the largest pedestrian attribute recognition dataset to date. The entire data set is randomly divided into training set, validation set and test set in a ratio of 8:1:1. Each image in the dataset is labeled with 26 attributes.

Insert image description here

Insert image description here

Quantitative analysis

Figure 5 gives examples of three different perspectives from the PETA dataset for qualitative analysis. As we can see, the proposed C2AM and CFAM4
can successfully identify age, gender, clothing, footwear and other attributes. In the first example, the pedestrian's clothing is not conducive to gender judgment, but the attribute of long hair is helpful for the identification of gender attributes. In the second example, the lower half of the pedestrian's clothing is partially occluded, but the clothing attributes of the upper half help to correctly identify the clothing attributes of the lower half. A failure case is also provided in the third example. Due to the correlation between short sleeves and shorts, C2AM incorrectly identifies pants as shorts. However, erroneous predictions
are well corrected in CFAM4.

Insert image description here

It is proposed to use the correlation between attributes to assist the detection and identification of each attribute. To obtain correlation information, we let the network learn a relationship matrix between attributes to quantify each pair of attributes in the attribute set. This part visualizes the relationship matrix learned in the network after convergence, as shown in Figure 6. The brighter the color, the greater the correlation. It can be seen that the relationship matrix learns more abstract information, for example, there is an obvious correlation between men and long hair in Figure 6a. Network CAFN will learn multiple different relationship matrices at the same time to jointly complete the final attribute recognition. Another relationship matrix in Figure 6b highlights the correlation between short sleeves and shorts, while another relationship matrix in Figure 6c highlights the correlation between sneakers and shoes.

Insert image description here

in conclusion

In this paper, considering how to exploit the correlation between arbitrary attribute pairs, the authors propose a novel architecture CAFN for pedestrian attribute recognition. It contains two basic modules: cross-attribute attention module and cross-feature attention module. Due to the cooperation between the two modules, the performance of CAFN is improved. The authors conducted experiments on three public datasets (PETA, RAP, PA-100K) and achieved convincing results. Experimental results show that Network CAFN outperforms most existing methods. In addition, a large number of experiments verified the effectiveness of two key modules in the network. In the future, it is meaningful to focus on how to explore and mine the correlation between images and attributes from a multi-modal perspective, which can further improve the model's ability to distinguish different attributes.

reference

This paper was published in Multimedia Systemsthe journal,

  • Impact factor: 3.9
  • Chinese Academy of Sciences Division: Computer Science District 4

[1] ZHAO R, LANG C, LI Z, et al. Pedestrian attribute recognition based on attribute correlation[J/OL]. Multimedia Systems, 2022, 28(3): 1069-1081. DOI:10.1007/s00530-022-00893-y.

Guess you like

Origin blog.csdn.net/orDream/article/details/132507732