Pedestrian attribute recognition 1: training PA100k data set

Preface

Recently I have been working on tasks related to pedestrian attribute recognition. This article is used to record the training process for future review.

There are currently many pedestrian attribute recognition warehouses available on the Internet. For example, Baidu's open source PP-Human attribute recognition , PULC human attribute recognition , and JD.com's JDAI-CV/fast-reid are all excellent works, but they are not available here. I plan to use the above project and trace it back to the source. I found that Baidu and JD.com’s pedestrian attribute recognition is based on the Rethinking_of_PAR project, so I can directly study this project. This article will also be trained based on this project, as well as subsequent modifications to the model and Train your own data set.

Latest indicators of pedestrian attribute recognition data set: https://paperswithcode.com/task/pedestrian-attribute-recognition/

Other related projects:

1. Data set preparation

The training is based on the PA100k dataset , which is the largest dataset for pedestrian attribute recognition to date, containing a total of 100,000 pedestrian images collected from outdoor surveillance cameras, each with 26 commonly used attributes. According to the official settings, the entire data set is randomly divided into 80,000 training images, 10,000 verification images, and 10,000 test images.

Here I package the data set and put it in my Baidu cloud disk for downloading. Link: https://pan.baidu.com/s/1WLWCZujhENVAL0Iz0BXnQQ Password: iulh . After downloading, unzip the image and .mat tag file. Rename the release_data folder to data and put it aside first.

Clone the training repository:

git clone https://github.com/valencebond/Rethinking_of_PAR.git
cd Rethinking_of_PAR

This warehouse provides many training methods for pedestrian attribute data sets. The accuracy indicators of each data set training are as follows:
Insert image description here
Here I only train on PA100k. The preparation methods for other data sets are similar because the author does not provide pre-trained models (Google Cloud Disk invalid), so if you want to test it, you need to run the training yourself first, and the training time is not very long.

Because this project reads annotation files in .pkl format, it needs to parse the .mat downloaded above and save it as a .pkl file. Fortunately, the author also provides the prepared script in dataset/pedes_attr/preprocess/format_pa100k .py, before using the format_pa100k.py script file, you need to read dataset/pedes_attr/annotation.md. The general content of the file is simply that you need to sort the attributes of the dataset from top to bottom according to the following standards. , the unified attribute order is:

  1. head region
  2. upper region
  3. lower region
  4. foot region
  5. accessory/bag
  6. age
  7. gender
  8. others

For pa100k, the new order after sorting the 26 attributes is:

num_in_group = [2, 6, 6, 1, 4, 7]

‘Hat’,‘Glasses’, [7,8] 2
‘ShortSleeve’,‘LongSleeve’,‘UpperStride’,‘UpperLogo’,‘UpperPlaid’,‘UpperSplice’, [13,14,15,16,17,18] 6
‘LowerStripe’,‘LowerPattern’,‘LongCoat’,‘Trousers’,‘Shorts’,‘Skirt&Dress’, [19,20,21,22,23,24] 6
‘boots’ [25] 1
‘HandBag’,‘ShoulderBag’,‘Backpack’,‘HoldObjectsInFront’, [9,10,11,12] 4
‘AgeOver60’,‘Age18-60’,‘AgeLess18’, [1,2,3] 3
‘Female’ [0] 1
‘Front’,‘Side’,‘Back’, [4,5,6] 3

permutation = [7,8,13,14,15,16,17,18,19,20,21,22,23,24,25,9,10,11,12,1,2,3,0,4,5,6]

So it will be much clearer to read the format_pa100k.py file again. According to the data set path just downloaded, modify the relevant configuration, and then run it to get the dataset_all.pkl file. It should be noted that if the data set is not stored in ./data/ If it is under PA100k and is stored in an external folder, an error will be reported when loading the data set. You need to modify tools/function.py here: change it to
Insert image description here
your data set path, otherwise it will be loaded by default in ./data. Just watch the modification. alright.

2. Start training

Modify the relevant configurations in the configs/pedes_baseline/pa100k.yaml configuration file, such as batch size, length and width, backbone, etc. Run directly in the terminal after modification:

python train.py --cfg ./configs/pedes_baseline/pa100k.yaml

The following interface appears, that is, the training starts:
Insert image description here
If you feel that the printing interval is too long, you can modify the printing interval in the file:
Insert image description here
After the training, the model is saved in pa100k/img_model in the exp_result folder. There is only one model because the name of the model is in the training The time is determined by the timestamp, so subsequent models saved with better accuracy will overwrite this model. If you want to modify the name of the saved model, you can modify this line of code: the saved model name has the optimal epoch and optimal accuracy
Insert image description here
. The information seems a little simpler and clearer.

The optimal accuracy obtained by training is:
Insert image description here

3. Model testing

Because the author did not provide the code to test a certain picture individually, I modified it based on the infer.py file and obtained the demo.py file, which is used to test the function of a single picture or multiple pictures in a folder:

import argparse
import json
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
import pickle

from dataset.augmentation import get_transform
from dataset.multi_label.coco import COCO14
from metrics.pedestrian_metrics import get_pedestrian_metrics
from models.model_factory import build_backbone, build_classifier

import numpy as np
import torch
from torch.utils.data import DataLoader
from tqdm import tqdm
from PIL import Image
from configs import cfg, update_config
from dataset.pedes_attr.pedes import PedesAttr
from metrics.ml_metrics import get_map_metrics, get_multilabel_metrics
from models.base_block import FeatClassifier
# from models.model_factory import model_dict, classifier_dict

from tools.function import get_model_log_path, get_reload_weight
from tools.utils import set_seed, str2bool, time_str
from models.backbone import swin_transformer, resnet, bninception,repvgg

set_seed(605)

clas_name = ['Hat','Glasses','ShortSleeve','LongSleeve','UpperStride','UpperLogo','UpperPlaid','UpperSplice','LowerStripe','LowerPattern','LongCoat','Trousers','Shorts','Skirt&Dress','boots','HandBag','ShoulderBag','Backpack',,'HoldObjectsInFront','AgeOver60','Age18-60','AgeLess18','Female','Front','Side','Back']

def main(cfg, args):
    exp_dir = os.path.join('exp_result', cfg.DATASET.NAME)
    model_dir, log_dir = get_model_log_path(exp_dir, cfg.NAME)

    train_tsfm, valid_tsfm = get_transform(cfg)
    print(valid_tsfm)

    backbone, c_output = build_backbone(cfg.BACKBONE.TYPE, cfg.BACKBONE.MULTISCALE)


    classifier = build_classifier(cfg.CLASSIFIER.NAME)(
        nattr=26,
        c_in=c_output,
        bn=cfg.CLASSIFIER.BN,
        pool=cfg.CLASSIFIER.POOLING,
        scale =cfg.CLASSIFIER.SCALE
    )

    model = FeatClassifier(backbone, classifier)

    if torch.cuda.is_available():
        model = torch.nn.DataParallel(model).cuda()

    model = get_reload_weight(model_dir, model, pth='best_11_0.8044.pth')                               # 修改此处的模型名字
    model.eval()

    with torch.no_grad():
        for name in os.listdir(args.test_img):
            print(name)
            img = Image.open(os.path.join(args.test_img,name))
            img = valid_tsfm(img).cuda()
            img = img.view(1, *img.size())
            valid_logits, attns = model(img)

            valid_probs = torch.sigmoid(valid_logits[0]).cpu().numpy()
            valid_probs = valid_probs[0]>0.5

            res = []
            for i,val in enumerate(valid_probs):
                if val:
                    res.append(clas_name[i])
                if i ==14 and val==False:
                    res.append("male")
            print(res)
            print()

def argument_parser():
    parser = argparse.ArgumentParser(description="attribute recognition",
                                     formatter_class=argparse.ArgumentDefaultsHelpFormatter)
    parser.add_argument(
        "--test_img", help="test images", type=str,
        default="./test_imgs",

    )
    parser.add_argument(
        "--cfg", help="decide which cfg to use", type=str,
    )
    parser.add_argument("--debug", type=str2bool, default="true")

    args = parser.parse_args()

    return args

if __name__ == '__main__':
    args = argument_parser()
    update_config(cfg, args)

    main(cfg, args)

Run the command: python demo.py --cfg ./configs/pedes_baseline/pa100k.yaml --test_img ./test_imgs, and get results similar to the following:
Insert image description here

My result here is to translate the English in the list into Chinese for display and processing. At this point, the training is over. The next article will write about how to modify and add a new network for training.

Guess you like

Origin blog.csdn.net/qq_39056987/article/details/126385564