Multi-label classification paper notes | (including code recurrence, huge pit summary) Combining Metric Learning and Attention Heads... (MLD-TResNet-L-AAM/GAT+AAM)

Notes on intensive reading of personal papers, mainly translation + experience. You are welcome to watch. If you are interested, you can leave a message in the comment area and we will discuss it together.
Paper: https://arxiv.org/pdf/2209.06585v2.pdf
Code: https://github.com/openvinotoolkit/deep-object-reid/tree/multilabel

1. Paper translation + understanding

0. Summary

Multi-label image classification allows predicting a set of labels from a given image. Unlike multi-class classification where each image is assigned only one label, this setup is suitable for a wider range of applications. In this work, we review two popular multi-label classification methods: transformer-based classification heads and label relationship infograph processing branches. Although transformer-based classification heads are believed to achieve better results than graph-based branches, we believe that with appropriate training strategies, graph-based methods can show little accuracy degradation while spending less time on inference. Computing resources. In our training strategy, we introduce metric learning modifications instead of asymmetric loss (ASL), which is the accepted standard for multi-label classification. In each binary classification subproblem, it operates using L2 normalized feature vectors from the backbone and forces the angle between the normalized representations of positive and negative samples to be as large as possible. This provides better recognition capabilities for non-normalized features than binary cross-entropy loss. Utilizing the proposed loss and training strategy, we obtain SOTA results for single-modality methods on a wide range of multi-label classification benchmarks such as MS-COCO, PASCAL-VOC, NUS-Wide and Visual Genome 500. The source code of our method is available as part of the OpenVINO™ training extension.

1 Introduction

Although the presence of multiple labels on an image is more natural than having only one hard label, multi-label classification is not as deeply developed. Due to the lack of specialized multi-label datasets, researchers turned general object detection datasets such as MS-COCO (Lin et al., 2014) and PASCAL VOC (Everingham et al., 2009) into challenging multi-label classification benchmarks. , by removing bounding boxes from data annotations and leveraging only their class labels.

Despite recent progress in solving the above benchmarks, recent work has mainly focused on the resulting model accuracy without considering computational complexity (Liu et al., 2021) or using outdated training techniques (Chen et al. , 2019), while introducing promising model architectures.

In this work, we are revisiting state-of-the-art multi-label classification methods, proposing lightweight solutions suitable for real-time applications, and improving the performance-accuracy trade-off of existing models.

The main contributions of this article are as follows:

  • We propose a modification of ML-GCN (Chen et al., 2019) that adds a graph attention mechanism (Velickovi et al., 2018) and performs graph and CNN feature fusion in a more traditional way, rather than in the graph branch Generate a set of binary classifiers.
  • We demonstrate that with appropriate training strategies , the performance gap between transformer-based classification heads and label co-occurrence modeling can be reduced through graph attention mechanisms.
  • We first apply the metric learning paradigm to the multi-label classification task and propose a modified version of the corner-edge binary loss (Wen et al., 2021), which adds an ASL mechanism (Baruch et al., 2021).
  • We validate the effectiveness of our loss and overall training strategy through comprehensive experiments on a wide range of multi-label classification benchmarks: PASCAL VOC, MS-COCO, Visual Genome (Krishna et al., 2016) and NUS-WIDE (Chua et al., 2009 ).

2. Related work

Historically, multi-label classification has received less attention than multi-class scenarios, but despite this, there is still a lot of progress in the field. By developing advanced loss functions (Baruch et al., 2021), label co-occurrence models (Chen et al., 2019; Yuan et al., 2022), and designing advanced classification heads (Liu et al., 2021; Ridnik et al., 2021b; Zhu and Wu, 2021), and discover architectures that consider the spatial distribution of objects by exploring attention regions (Wang et al., 2017; Gao and Zhou, 2021).

The traditional method is to convert a multi-label classification task into a set of binary classification tasks and solve it by optimizing the binary cross-entropy loss function. Each single-class classification subtask suffers from severe positive and negative imbalance. The more classes the training dataset contains, the more negative information we get in each single-class subtask, since images usually contain only a small subset of a large number of classes. Improved asymmetric loss (Baruch et al., 2021), which reduces weights and hard thresholds and easily obtains negative samples, shows impressive results, reaching state-of-the-art results on multiple popular multi-label datasets , without any complex architectural skills. These results show that the correct choice of loss function is critical to multi-label classification performance.

Another promising direction is to design class-specific classifiers instead of using fully connected layers on a single feature vector produced by a backbone network. This approach also does not introduce additional training steps and only slightly increases model complexity. The authors of (Zhu and Wu, 2021) proposed an alternative to a global average pooling layer that generates class-specific features for each class. Utilizing a compact Transformer classification head to generate these features (Liu et al., 2021; Ridnik et al., 2021b) has proven to be more effective. This approach assumes pooling of class-specific features by using learnable embedding queries.

Considering the distribution of object locations, or statistical label relationships requires data preprocessing and additional assumptions (Chen et al., 2019; Yuan et al., 2022) or complex model architecture (Wang et al., 2017; Gao and Zhou, 2021). For example, (Chen et al., 2019; Yuan et al., 2022) represent labels through word embeddings; then build a directed graph on these label representations, where each node represents a label. Then stack GCNs are learned on this graph to obtain a set of object classifiers. This approach relies on the ability to represent labels as words, which is not always possible. Spatial distribution modeling requires placing an rcnn-like module (Girshick et al, 2014) (Wang et al, 2017; Gao and Zhou, 2021) in the model, which greatly increases the complexity of the training chain.

3. Method

In this section, we describe the entire training chain and the details of our approach. Our goal is not only to obtain competitive results, but also to make training more friendly to the end user and more adaptable to the data. Therefore, following the principles described in (Prokofiev and Sovrasov, 2022), we use lightweight model architecture, hyperparameter optimization and early stopping .

3.1 Model architecture

We selected EfficientNetV2 (Tan and Le, 2021) and TResNet (Ridnik et al, 2021a) as the infrastructure to perform multi-label image classification. That is, we conduct all experiments on TResNet-L, EfficientNetV2, small and large. On top of these backbones, we use two different feature aggregation methods and compare their effectiveness and performance.

3.2 Transformer multi-label classification head

As a representative of transformer-based feature aggregation methods, we use the ML-Decoder (Ridnik et al, 2021b) classification head. It provides up to K feature vectors (where K is the number of classes) as model output, instead of using a single class-independent vector when using the standard global average pooling (GAP) classification head. We take x(dimension C × H × W) as model input, then model F with parameter W produces a downscaled multi-channel feature map F = FW (x) (dimension S x H/d × W/d), Where S is the number of output channels and d is the spatial downscaling factor. This feature map is then passed to the ML-Decoder head: v = MLD(f) (dimension M × L), where M is the embedding dimension and L≤K is the number of groups in the decoder. Finally, the vector v is projected to logarithms of class K via a fully connected projection (if L = K) or a group fully connected projection (if L < K) as described in (Ridnik et al, 2021b). In our experiments, we set L = min(100, K). Additionally, we L2 normalize the parameters of all dot products in the projection, in case we need to attach a metric learning loss to the MLDecoder classification header.
Insert image description here

3.3 Graph attention multi-label branch (GAT)

The original structure of the graph processing branch of (Chen et al., 2019) assumes that classifiers are generated in this branch and then directly applied to the features generated by the backbone. This approach is incompatible with transformer-based classification heads or any other processing of raw spatial features, such as CSRA (Zhu and Wu, 2021). To alleviate this limitation, we propose the architecture shown in Figure 1.
Insert image description hereFigure 1: Overall schematic diagram of the proposed feature reweighting method based on GAT (graph attention). The channels in the CNN spatial features are reweighted using features obtained from the label relationship graph using GAT. Then, a pooling operation is performed on the reweighted features to obtain a vector representation. Finally, the resulting vector is fed into a binary classifier Wj and optimized using the AAM loss introduced in this article.

We use estimates of conditional probabilities to construct Z (label correlation matrix) instead of relying entirely on GLOVE and calculating cosine similarity.

We use the graph attention layer to process the input and obtain the output h (dimension S×K). Then, we obtain the most influential features through the maximum pooling operation, and obtain the S power of the weight R, which is used for further weighting of the CNN spatial features to obtain f ~ \widetilde{f}f . Next, we apply global average pooling and max pooling operations to f ~ \widetilde{f} in parallelf , and sum the results to get the final potential embedding v ~ \widetilde{v}v . Embedded v ~ \widetilde{v}v Finally it is passed to the binary classifier. Instead of applying simple spatial pooling, we can pass weighted features to ML-Decoder or any other feature processing module .

The main advantage of using the graph attention (GAT) branch to reweight the features of the transformer head is the small computational and model complexity overhead during the inference stage. Since the GAT branch has the same input for any image, we can compute the results of its execution just once before starting inference on the resulting model. At the same time, GAT requires a vector representation of the label. This representation can be generated via a text-to-vector model if we have meaningful descriptions for all tags (or even individual words). This condition does not always hold: some datasets may have unnamed labels. In this case, how to generate representations for labels remains an open question.

Here the author describes his GAT method and end-to-end solution. As for the subsequent GAP structure, the author proposes that it can also be replaced by the ML-Decoder detection head, rather than saying that it must be replaced. When I first read this, I thought it would be replaced by ML-Decoder. MLD is the final form, but it seems not to be the case from the subsequent experiments. The major innovations in this article are the GAT structure in Figure 1 and the AAM behind it.

3.4 Corner-edge binary classification (AAM, a loss that combines ASL and metric learning)

Recently, asymmetric loss (Baruch et al., 2021) has become a standard loss option for performing multi-label classification. By design, it penalizes each logit with a modified binary cross-entropy loss. The asymmetric processing of positive and negative samples allows ASL to reduce the loss weight of negative samples to solve the problem of positive and negative imbalance. However, from the perspective of the discriminative ability of the model, this method still has room for improvement.

Corner edge loss produces more discriminative classification features than cross-entropy loss, which is a necessary property for recognition tasks (Deng et al., 2018; Wen et al., 2021; Sovrasov and Sidnev, 2021).

We propose to combine the paradigms of (Baruch et al., 2021) and (Wen et al., 2021) to build stronger losses for multi-label classification. Represents the dot product result of the normalized class embedding vj generated by ML-Decoder (using the backbone or connected to the GAT-based classification head), and the jth binary classifier Wj is cos. Then, for the training sample x and the corresponding embedding set v, we express the asymmetric angular edge loss (AAM) as: In the
Insert image description here
formula, s is the scale parameter, m is the angular margin, k is the positive and negative weighting coefficient, r+, r − is the weighting parameter from ASL. Although there are a large number of hyperparameters, some of them can be safely fixed (like r+ and r− from ASL). The effect of varying s reaches saturation when increasing s (see Figure 2b), and if the appropriate value of this parameter is large enough, we do not need to adjust it precisely. In addition, the value of m should be close to 0, since it replicates the effects of s and r to a certain extent and may even bring an undesirable increase in the negative part of the AAM (see Figure 2a). Section 4.5 provides a detailed analysis of hyperparameters.
Insert image description here

The main topic here is loss-AAM, which combines ASL and metric learning. I have never learned either of these before, so I am confused about it. This requires prior knowledge. I will make up for it later when I have time. Let’s take a look first. perhaps.

3.5 Details of training strategy

As in our previous work (Prokofiev and Sovrasov, 2022), we aim to make the training link for multi-label classification reliable, fast and adaptive to datasets, so we use the following components:

  • The SAM (forest et al., 2020) optimizer without bias decay (He et al., 2019b) is the default optimizer;
  • EMA weighted average to prevent overfitting;
  • Initial learning rate estimation procedure from (Prokofiev and Sovrasov, 2022);
  • OneCycle (Smith, 2018) learning rate scheduler;
  • Early stopping heuristic: If the best result on the validation subset does not improve within 5 epochs, and the evaluation result is lower than the EMA average sequence of the previous best result, the training process stops;
  • Random flipping, predefined Randaugment (Cubuk et al., 2020) strategy and Cutout (Devries and Taylor, 2017) data augmentation.

4. Experiment

  • Evaluation criteria
    We use commonly used indicators to evaluate multi-label classification models: average precision (mAP) of all categories, overall precision (OP), recall (OR), F1-measure (OF1) and precision of each category (CP) , recall rate (CR), F1-measure (CF1). We use mAP as the primary metric; other methods are provided in a high-level comparison of various methods. In every operation that requires a confidence threshold, the threshold value 0.5 is replaced. The exact formulas for the above indicators can be found in (Liu et al., 2021).

  • Characteristics of each data set
    Insert image description here

  • Testing on MS-COCO dataset
    Insert image description hereTable 2 gives the results of MS-COCO. For this data set, we set s = 23, lr = 0:007, r−= 1, r+=0. In the case of AAM loss, we can use TResNet-L as the backbone to obtain state-of-the-art results. At the same time, efficientnetv2, which combines ML-Decoder and AAM loss, outperforms TResNet-L with ASL while consuming 3.5 times less FLOPS. The performance of GCN/GAT branch is slightly worse than ML-Decoder, but still better than EfficientNetV2-s + ASL in the marginal computational cost of inference.

  • Testing on Pascal-VOC dataset
    Insert image description here

The Pascal-VOC test results are shown in Table 3. We set s = 17, lr = 0:005, r−= 2, r+ = 1 to train our model on this data set. Our modifications to the GAT branch outperform ML-Decoder when using EfficientNet-V2-s, while the AAM loss provides a small performance boost and allows SOTA to be achieved using TResNet-L. Furthermore, on Pascal-VOC, EfficientNet-V2-s using all considered additional graph branches or heads shows a better speed/accuracy trade-off than TResNet-L using ASL.

The GAT re-weighting here, I understand, is the author’s end-to-end solution in Figure 1, and then ML-Decoder + AAM is used to verify the effectiveness of AAM. Here’s the code. I initially saw that the author’s open source is also based on EfficientNet-V2-s. Backbone's GAT re-weighting code.

  • ablation experiment

Here is a screenshot directly, which is to control the variables to do experiments and verify the effectiveness of the module.

Insert image description here
Insert image description here
To demonstrate the impact of each component on the overall pipeline, we add them one by one to the baseline.

As a baseline, we adopt the EfficientNetV2-s backbone and ASL loss using the SGD optimizer. We set all hyperparameters of ASL loss and learning rate to (Baruch et al., 2021). We use the training strategy described in Section 3.5 in all experiments.

In Table 6, we can see that each component brings improvements except adding the GAT branch. ML-Decoder is powerful enough to learn label-related information, so providing further clues about GAT branches will not improve the results. Furthermore, we can see that the adjustment of the r parameter is beneficial to the AAM loss, but the metric learning method itself will bring improvements even without it. Finally, adding the GAT branch to ML-Decoder did not improve accuracy, indicating that the additional information from GAT does not provide new clues to MLDecoder.

5 Conclusion

In this work, we revisit two popular multi-label classification methods: transformer-based header and label-graph branching. We improve the performance of these methods by applying our training strategy and modern bag of tricks, and introduce a new multi-label classification loss called AAM. This loss combines the properties of ASL losses and metric learning methods and allows to obtain competitive results on popular multi-label benchmarks. Although we demonstrate that graph branches perform very close to transformer-based heads, the graph-based approach has a major drawback: it relies on label representations provided by language models. A direction for future work might be to develop a method that would build a tag graph that relies on representations extracted directly from the image, without involving potentially meaningless tag names.

2. Code Reproduction

Read all of the following before getting started. It’s a bit messy. I’ll sort it out when I have time. There are a lot of pitfalls. Let me summarize them first:

  1. The official code and Python module package do not provide a version, which is the source of all evil;
  2. Then the version provided by pytorch is also wrong, the code used has the functions of 2.0, and the version of numpy is also wrong;
  3. Finally, the version of the module package needs to be updated, and some content in the code also needs to be adjusted;
  4. Furthermore, the code uses a link to download, but the key is to access the Internet scientifically. I don’t have Linux, so I can only download it locally, and then find the corresponding place to put it, ***!

0. Write in front

I will list the versions here first. I use conda update condathe latest version of conda, then cuda, mine is 11.8, cudnn is compatible with it, and python is 3.8, I will list them all.

pip freeze > requirements.txt

collect!

google-auth==2.23.3
google-auth-oauthlib==1.0.0
greenlet==3.0.0
grpcio==1.59.0
huggingface-hub==0.18.0
idna==3.4
importlib-metadata==6.8.0
importlib-resources==6.1.0
inplace-abn==1.1.0
Jinja2==3.1.2
joblib==1.3.2
kiwisolver==1.4.5
Mako==1.2.4
Markdown==3.5
MarkupSafe==2.1.1
matplotlib==3.7.3
mkl-fft==1.3.8
mkl-random==1.2.4
mkl-service==2.4.0
mpmath==1.3.0
networkx==3.1
numpy==1.24.4
oauthlib==3.2.2
onnx==1.14.1
opencv-python==4.8.1.78
optuna==3.4.0
packaging==23.2
Pillow==10.0.1
protobuf==4.24.4
ptflops==0.7.1.1
pyasn1==0.5.0
pyasn1-modules==0.3.0
pycparser==2.21
pyOpenSSL==23.2.0
pyparsing==3.1.1
PySocks==1.7.1
python-dateutil==2.8.2
pytorchcv==0.0.67
PyYAML==6.0.1
randaugment==1.0.2
requests==2.31.0
requests-oauthlib==1.3.1
rsa==4.9
safetensors==0.4.0
scikit-learn==1.3.1
scipy==1.10.1
six==1.16.0
sklearn==0.0.post10
soupsieve==2.5
SQLAlchemy==2.0.22
sympy==1.11.1
tb-nightly==2.14.0a20230808
tensorboard-data-server==0.7.1
terminaltables==3.1.10
threadpoolctl==3.2.0
timm==0.9.7
torch==2.0.1
torch-lr-finder==0.2.1
torchaudio==2.0.2
-e git+https://github.com/openvinotoolkit/deep-object-reid.git@c92f63d802a03a873f2706462bdd2d5cf8a22de4#egg=torchreid
torchvision==0.15.2
tqdm==4.66.1
triton==2.0.0
typing-extensions==4.7.1
urllib3==1.26.16
werkzeug==3.0.0
yacs==0.1.8
zipp==3.17.0

Note: You can also refer to the official package and bring the version here.

The corresponding version of cuda. ​​Note that the cudatoolkit in conda must also be consistent with the cuda version under /us/local:

cudatoolkit               11.8.0               h6a678d5_0
cudnn                     8.9.2.26               cuda11_0

1. Create a conda environment (the following is all done in conda)

conda create --name torchreid python=3.8.18
  • Enter conda
conda activate torchreid

or

source activate torchreid

2. Pull the code

  • Pull
git clone https://github.com/openvinotoolkit/deep-object-reid.git
  • switch branch
git checkout multilabel

3. Installation environment

  • Install cudatoolkit (must install cuda first)
conda install cudatoolkit=11.8
  • Install cudnn (you can install it directly here, cudnn supporting cuda will be installed)
conda install cudnn
  • Configuration Environment
pip install -r requirements.txt
  • Encountered a problem 1
Collecting inplace_abn
  Using cached inplace-abn-1.1.0.tar.gz (137 kB)
  Preparing metadata (setup.py) ... error
  error: subprocess-exited-with-error

  × python setup.py egg_info did not run successfully.exit code: 1
  ╰─> [6 lines of output]
      Traceback (most recent call last):
        File "<string>", line 36, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "/tmp/pip-install-div2jd3n/inplace-abn_810be5ed00194c44a5bcc754bf5501e0/setup.py", line 4, in <module>
          import torch
      ModuleNotFoundError: No module named 'torch'
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.
  • solution
pip install --upgrade setuptools
python -m pip install --upgrade pip
  • Encountered problem 2
ImportError: libGL.so.1: cannot open shared object file: No such file or directory
  • solution
apt install libgl1-mesa-glx

4. Training commands

python tools/main.py --config-file configs/EfficientNetV2_small_gcn.yml --gpu-num 1 custom_datasets.roots "['datasets/COCO/train.json', 'datasets/COCO/val.json']" data.save_dir ./out/
  • Encountered problem 3

Due to network problems, two files cannot be downloaded, namely model.safetensors and pytorch_model.bin. (requires scientific Internet access)

  • solution

Better to download locally:

https://huggingface.co/timm/tf_efficientnetv2_s.in21k/resolve/main/model.safetensors
https://cdn-lfs.huggingface.co/repos/1a/54/1a5499191575630110693f1105f43e325ae7f696b9f8d34db19ab1309ce0aa15/09dc7ef3e90ec4570d22ae1af1c12cbc1aff590b2c68dec1cbd781340e5a8ccc?response-content-disposition=attachment%3B+filename*%3DUTF-8%27%27pytorch_model.bin%3B+filename%3D%22pytorch_model.bin%22%3B&response-content-type=application%2Foctet-stream&Expires=1697800394&Policy=eyJTdGF0ZW1lbnQiOlt7IkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTY5NzgwMDM5NH19LCJSZXNvdXJjZSI6Imh0dHBzOi8vY2RuLWxmcy5odWdnaW5nZmFjZS5jby9yZXBvcy8xYS81NC8xYTU0OTkxOTE1NzU2MzAxMTA2OTNmMTEwNWY0M2UzMjVhZTdmNjk2YjlmOGQzNGRiMTlhYjEzMDljZTBhYTE1LzA5ZGM3ZWYzZTkwZWM0NTcwZDIyYWUxYWYxYzEyY2JjMWFmZjU5MGIyYzY4ZGVjMWNiZDc4MTM0MGU1YThjY2M%7EcmVzcG9uc2UtY29udGVudC1kaXNwb3NpdGlvbj0qJnJlc3BvbnNlLWNvbnRlbnQtdHlwZT0qIn1dfQ__&Signature=rdjIofPONLg5roVfXki%7EycLUzpfL0VQWI4RaT6-1mTyh5gJoXxHZna1Z97YbSuU9t-KFi2Q5iGguyjKsHhZzSlRv-zVd0lPhPGmuMGcE7npRrZnPNFB2V5XFYtho8fWzKVwreGujYmaFE7Ge-zkTivnO7YkqmNs%7EeH69FRmbp5xAqW0LbsEspr0aOkaxqCCVoFXgPyJBra4dVaITeremwJSD0OcW2F2zNCdbo0nGcCDzrYsRijiUIAXweXpESL10uLt776efgzGhrDtuKKdVD68Q-3voq5PdTWTQaVP2VCDjkb66QECoAoUDtRjGJtK41zNfBs-1N6RGMDUWMNRM4A__&Key-Pair-Id=KVTP0A1DKRTAX

Then put it /root/.cache/huggingface/hub/models--timm--tf_efficientnetv2_s.in21kdown.

  • Encountered problem 4
Couldn't apply path mapping to the remote file.
  • Solution:
    I encountered this problem because the remote server was not synchronized. Just wait for a while.

(There is also the problem of poor network. I downloaded it manually, forcibly changed the path, and ran it first)
Insert image description here

Insert image description here

  • Encountered problem 5
AttributeError: module 'torch' has no attribute 'frombuffer'
  • The solution
    frombufferis the method in torch2.0.
    Reference: https://pytorch.org/get-started/previous-versions/Install
    torch2.0
conda install pytorch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 pytorch-cuda=11.8 -c pytorch -c nvidia

Then cuda in docker needs to be changed to 11.8, cudnn needs to be changed to the corresponding one; cudatoolkit in conda also needs to be changed to 11.8

Then also modify:

vim /path/to/deep-object-reid-multilabel/torchreid/models/gcn.pyInsert image description here
vim /root/anaconda3/envs/torchreid/lib/python3.8/site-packages/randaugment/randaugment.py
Insert image description here

  • Encountered problem 6

I want to download pth, but the network is not good and an error is reported. (requires scientific Internet access)

  • solution:

Manual download: https://drive.google.com/uc?export=download&id=1N0t0eShJS3L1cDiY8HTweKfPKJ55h141
then put in and rename:/root/.cache/torch/checkpoints/uc?export=download&id=1N0t0eShJS3L1cDiY8HTweKfPKJ55h141.pth

  • Encountered problem 7
<class 'TypeError'> : _matmul_tensor_flops_hook() missing 1 required positional argument: 'other'
Traceback (most recent call last):
  File "/root/anaconda3/envs/torchreid/lib/python3.8/site-packages/ptflops/pytorch_engine.py", line 60, in get_flops_pytorch
    _ = flops_model(batch)
  File "/root/anaconda3/envs/torchreid/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1538, in _call_impl
    result = forward_call(*args, **kwargs)
  File "/home/thomascai/code/deep-object-reid-multilabel/torchreid/models/gcn.py", line 157, in forward
    logits = self.head(glob_features.view(glob_features.size(0), -1))
  File "/root/anaconda3/envs/torchreid/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/thomascai/code/deep-object-reid-multilabel/torchreid/losses/am_softmax.py", line 37, in forward
    cos_theta = F.normalize(x.view(x.shape[0], -1), dim=1).mm(F.normalize(self.weight, p=2, dim=0))
  File "/root/anaconda3/envs/torchreid/lib/python3.8/site-packages/ptflops/pytorch_engine.py", line 333, in __call__
    flops = self.handler(*args, **kwds)
TypeError: _matmul_tensor_flops_hook() missing 1 required positional argument: 'other'
Traceback (most recent call last):
  File "/root/.pycharm_helpers/pydev/pydevd.py", line 1496, in _exec
python-BaseException
    pydev_imports.execfile(file, globals, locals)  # execute the script
  File "/root/.pycharm_helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "/home/thomascai/code/deep-object-reid-multilabel/tools/main.py", line 143, in <module>
    main()
  File "/home/thomascai/code/deep-object-reid-multilabel/tools/main.py", line 80, in main
    print(f'Main model complexity: params={num_params:,} flops={macs * 2:,}')
TypeError: unsupported format string passed to NoneType.__format__
  • solution

I can't find it on the entire Internet. I found that the comment point is okay, so I commented it out first.
Insert image description here

  • Encountered problem 8
RuntimeError: DataLoader worker (pid 3273) is killed by signal: Bus error. It is possible that dataloader's workers are out of shared memory. Please try to raise your shared memory limit.
  • solution

The shared memory is not enough, package the environment, restart docker, and set shm=2G

  • Encountered problem 9

Then it reports that cuda memory is not enough

  • solution:

Adjust batch size:

vim configs/EfficientNetV2_small_gcn.yml

Insert image description here
I finally started running, my head was buzzingInsert image description here

Notice :

What I am running here is the COCO data set. I need to put the images in the coco2014 training and validation data sets into the project directory/datasets/COCO/images.

Then configs/EfficientNetV2_small_gcn.ymlchange the voc of the configuration file to coco, pay attention to the corresponding case.

Insert image description here
Insert image description here

Guess you like

Origin blog.csdn.net/ThomasCai001/article/details/133798199