Use DiffusionDet to train your own dataset (pascal-voc)


This post is based on the environment of DiffusionDet and detectron2 and has been configured (you can run through the demo.py of DiffusionDet).
I did not follow the official procedure to establish a soft link or something. It is troublesome. I just built the directory according to my own habits.

DiffusionDet code link: https://github.com/ShoufaChen/DiffusionDet
detectron2 code link: https://github.com/facebookresearch/detectron2

1. Data format

① Directory format

insert image description here

②Annotations: xml file

insert image description here

③ImageSets/Main/under: train.txt and val.txt

train.txt:
insert image description here
val.txt:
insert image description here

④ Under JPEGImages: pictures

insert image description here

2. Download the pre-trained model

https://github.com/ShoufaChen/DiffusionDet
insert image description here

The COCO-Res50 I downloaded is placed in DiffusionDet-main/model/ (built by myself)

insert image description here

3. Modify the code

This part is a huge trouble, I am convinced by detectron2, (solve the problem of Fix for numpy deprecation of np.str https://github.com/facebookresearch/detectron2/pull/4806 ).
If there are any other good methods, I will change them later.

① Modify the configuration file diffdet.coco.res50.yaml

DiffusionDet-main/configs/下的diffdet.coco.res50.yaml

  1. The red box must be changed
  2. The full path I wrote for the WEIGHTS parameter (no mistake)
  3. Change NUM_CLASSES according to your own data set category
  4. Do not move the voc_2007_train and voc_2007_val of line11 and line12, it represents the data format, not your file name

insert image description here

② Modify the configuration file Base-DiffusionDet.yaml

Modify this batch when the video memory bursts later, and adjust it to a smaller size.
insert image description here

③ Modify pascal_voc.py in detectron2

This step is because after modifying the configuration file according to the above, in fact, it should be able to run, but python train_net.py --config-file configs/diffdet.coco.res50.yamlthere is a problem during runtime.

  File "/opt/conda/lib/python3.8/site-packages/detectron2/data/datasets/pascal_voc.py", line 35, in load_voc_instances
    fileids = np.loadtxt(f, dtype=np.str)
  File "/opt/conda/lib/python3.8/site-packages/numpy/__init__.py", line 305, in __getattr__
    raise AttributeError(__former_attrs__[attr])
AttributeError: module 'numpy' has no attribute 'str'.
`np.str` was a deprecated alias for the builtin `str`. To avoid this error in existing code, use `str` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.str_` here.
The aliases was originally deprecated in NumPy 1.20; for more details and guidance see the original release note at:
    https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations

insert image description here
Let me talk about all the solutions first,
the first one: modify the numpy version to 1.20.1 (recommended)
the second one: violently change the code: need to modify /opt/conda/lib/python3.8/site-packages/detectron2/data/ datasets/pascal_voc.py file, let’s talk step by step:
open /opt/conda/lib/python3.8/site-packages/detectron2/data/datasets/pascal_voc.py file:

vim /opt/conda/lib/python3.8/site-packages/detectron2/data/datasets/pascal_voc.py

Four places need to be modified.
insert image description here
After the modification, python train_net.py --config-file configs/diffdet.coco.res50.yamlit can be run.

Detailed solution steps:
I see this problem in the GitHub issue of Detectron2:
https://github.com/facebookresearch/detectron2/pull/4806

insert image description here

At first I thought that when I was installing detectron2, I should pull the code and compile it. Later, I found out that this is not the case. The new code is also the same. The details are as follows: But I didn’t see the specific
solution. I saw the latest version of the code. This problem also exists in detectron2/detectron2/data/datasets/pascal_voc.py.

Modification method:
1. Tried to modify the numpy version, but later thought it was better to change the code of detectron2 directly.
2. Since it is a problem in /opt/conda/lib/python3.8/site-packages/detectron2/data/datasets/pascal_voc.py, then modify this file.
Run the following command:

vim /opt/conda/lib/python3.8/site-packages/detectron2/data/datasets/pascal_voc.py

The original version is not copied here, I will directly mark where I changed it:
(my modification method is more violent, and directly modify the source code when encountering problems) Later, new problems
insert image description here
appeared :

    DatasetCatalog.register(name, lambda: load_voc_instances(dirname, split, class_names))
  File "/opt/conda/lib/python3.8/site-packages/detectron2/data/datasets/pascal_voc.py", line 41, in load_voc_instances
    anno_file = os.path.join(annotation_dirname, fileid + ".xml")
numpy.core._exceptions.UFuncTypeError: ufunc 'add' did not contain a loop with signature matching types (dtype('<U13'), dtype('<U4')) -> None

Solution:
Or modify the /opt/conda/lib/python3.8/site-packages/detectron2/data/datasets/pascal_voc.py file and
run the following command:

vim /opt/conda/lib/python3.8/site-packages/detectron2/data/datasets/pascal_voc.py

Modifying the code: a new problem
insert image description here
appeared later :

File "/opt/conda/lib/python3.8/site-packages/iopath/common/file_io.py", line 604, in _open
    return open(  # type: ignore
FileNotFoundError: [Errno 2] No such file or directory: "datasets/VOC2007/Annotations/['7_hunse_left' '(28)'].xml"

This is equivalent to all the file paths have been spliced, but due to the version or the code modification just now, the path I spliced ​​has [ ] ' ' characters. My original file name was 7_hunse_left(28).xml, and it turned out to be ['7_hunse_left' '(28)'].xml, so the
next step is to modify the code and delete these useless characters in the path.
Still modify the /opt/conda/lib/python3.8/site-packages/detectron2/data/datasets/pascal_voc.py file and
run the following command:

vim /opt/conda/lib/python3.8/site-packages/detectron2/data/datasets/pascal_voc.py

Modify the code:
insert image description here

rep=[('[', ''), (']', ''),('\'', '')]
        for c, r in rep:
            if c in anno_file:
                anno_file= anno_file.replace(c, r)
        for c, r in rep:
            if c in jpeg_file:
                jpeg_file = jpeg_file.replace(c, r)

At this time, the operation python train_net.py --config-file configs/diffdet.coco.res50.yamlis successful.
The following information is displayed:

4. Successful training

Command Line Args: Namespace(config_file='configs/diffdet.coco.res50.yaml', dist_url='tcp://127.0.0.1:49152', eval_only=False, machine_rank=0, num_gpus=1, num_machines=1, opts=[], resume=False)
[04/07 03:14:31 detectron2]: Rank of current process: 0. World size: 1
[04/07 03:14:33 detectron2]: Environment info:
----------------------  ---------------------------------------------------------
sys.platform            linux
Python                  3.8.8 (default, Feb 24 2021, 21:46:12) [GCC 7.3.0]
numpy                   1.24.2
detectron2              0.6 @/opt/conda/lib/python3.8/site-packages/detectron2
Compiler                GCC 7.3
CUDA compiler           CUDA 11.1
detectron2 arch flags   3.7, 5.0, 5.2, 6.0, 6.1, 7.0, 7.5, 8.0, 8.6
DETECTRON2_ENV_MODULE   <not set>
PyTorch                 1.8.1 @/opt/conda/lib/python3.8/site-packages/torch
PyTorch debug build     False
GPU available           Yes
GPU 0                   Tesla T4 (arch=7.5)
Driver version          470.161.03
CUDA_HOME               /usr/local/cuda
Pillow                  8.1.2
torchvision             0.9.1 @/opt/conda/lib/python3.8/site-packages/torchvision
torchvision arch flags  3.5, 5.0, 6.0, 7.0, 7.5, 8.0, 8.6
fvcore                  0.1.5.post20221221
iopath                  0.1.9
cv2                     4.7.0
----------------------  ---------------------------------------------------------
PyTorch built with:
  - GCC 7.3
  - C++ Version: 201402
  - Intel(R) Math Kernel Library Version 2020.0.2 Product Build 20200624 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v1.7.0 (Git Hash 7aed236906b1f7a05c0917e5257a1af05e9ff683)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 11.1
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=compute_37
  - CuDNN 8.0.5
  - Magma 2.5.2
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.1, CUDNN_VERSION=8.0.5, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.8.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON,

[04/07 03:14:33 detectron2]: Command line arguments: Namespace(config_file='configs/diffdet.coco.res50.yaml', dist_url='tcp://127.0.0.1:49152', eval_only=False, machine_rank=0, num_gpus=1, num_machines=1, opts=[], resume=False)
[04/07 03:14:33 detectron2]: Contents of args.config_file=configs/diffdet.coco.res50.yaml:
_BASE_: "Base-DiffusionDet.yaml"
...(省略)
    NUM_CLASSES: 1
    NUM_CLS: 1
    NUM_DYNAMIC: 2
VERSION: 2
VIS_PERIOD: 0

[04/07 03:14:33 detectron2]: Full config saved to ./output/config.yaml
anno_file : datasets/VOC2007/Annotations/7_hunse_left (28).xml
jpeg_file : datasets/VOC2007/JPEGImages/7_hunse_left (28).jpg
...(省略)
anno_file : datasets/VOC2007/Annotations/9_yiwu_right (9).xml
jpeg_file : datasets/VOC2007/JPEGImages/9_yiwu_right (9).jpg
[04/07 03:14:37 d2.data.build]: Using training sampler TrainingSampler
[04/07 03:14:37 d2.data.common]: Serializing 255 elements to byte tensors and concatenating them all ...
[04/07 03:14:37 d2.data.common]: Serialized dataset takes 0.12 MiB
WARNING [04/07 03:14:37 d2.solver.build]: SOLVER.STEPS contains values larger than SOLVER.MAX_ITER. These values will be ignored.
[04/07 03:14:37 fvcore.common.checkpoint]: [Checkpointer] Loading from /home/workspace_disk/DiffusionDet-main/model/diffdet_coco_res50.pth ...
WARNING [04/07 03:14:37 fvcore.common.checkpoint]: Skip loading parameter 'head.head_series.1.class_logits.bias' to the model due to incompatible shapes: (80,) in the checkpoint but (1,) in the model! You might want to double check if this is expected.
WARNING [04/07 03:14:37 fvcore.common.checkpoint]: Skip loading parameter 'head.head_series.5.class_logits.bias' to the model due to incompatible shapes: (80,) in the checkpoint but (1,) in the model! You might want to double check if this is expected.
WARNING [04/07 03:14:37 fvcore.common.checkpoint]: Some model parameters or buffers are not found in the checkpoint:
head.head_series.0.class_logits.{
    
    bias, weight}
head.head_series.1.class_logits.{
    
    bias, weight}
head.head_series.2.class_logits.{
    
    bias, weight}
head.head_series.3.class_logits.{
    
    bias, weight}
head.head_series.4.class_logits.{
    
    bias, weight}
head.head_series.5.class_logits.{
    
    bias, weight}
[04/07 03:14:37 d2.engine.train_loop]: Starting training from iteration 0
[04/07 03:15:09 d2.utils.events]:  eta: 19:46:08  iter: 19  total_loss: 29.27  loss_ce: 2.284  loss_bbox: 0.9446  loss_giou: 1.874  loss_ce_0: 2.142  loss_bbox_0: 1  loss_giou_0: 1.886  loss_ce_1: 2.153  loss_bbox_1: 0.9317  loss_giou_1: 1.889  loss_ce_2: 1.589  loss_bbox_2: 0.8879  loss_giou_2: 1.887  loss_ce_3: 1.95  loss_bbox_3: 0.8332  loss_giou_3: 1.868  loss_ce_4: 2.502  loss_bbox_4: 0.8332  loss_giou_4: 1.867  time: 1.5567  data_time: 0.0228  lr: 7.2025e-07  max_mem: 9072M

5. Other possible problems

Question ①: CUDA out of memory

File "/opt/conda/lib/python3.8/site-packages/torch/nn/functional.py", line 2205, in layer_norm
    return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)
RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 14.76 GiB total capacity; 13.67 GiB already allocated; 21.75 MiB free; 13.72 GiB reserved in total by PyTorch)

修改DiffusionDet-main/configs/Base-DiffusionDet.yaml
insert image description here

Question ②: tuple.index(x): x not in tuple

 File "/opt/conda/lib/python3.8/site-packages/detectron2/data/datasets/pascal_voc.py", line 80, in load_voc_instances
    {
    
    "category_id": class_names.index(cls), "bbox": bbox, "bbox_mode": BoxMode.XYXY_ABS}
ValueError: tuple.index(x): x not in tuple

The reason is that the data categories are inconsistent. Modify: detectron2/data/datasets/pascal_voc.py
and change CLASS_NAMES to the correct category.

6. Evaluation

①Change DiffusionDet-main/train_net.py

I refer to several posts, and some have changed the entire build_evaluator, and found that the changes are too large, so I only changed the calling function.
insert image description here

②If you only have one category, be sure to change detectron2/data/datasets/pascal_voc.py

Add a comma, otherwise he will split my bad into three categories,
insert image description here

The following is a tuple. If there is only one element, a comma must be included, which is careless.
Under detectron2/data/datasets/pascal_voc.py:
insert image description here
This time it’s all right
insert image description here

问题①:‘NoneType’ object has no attribute ‘text’

The problem lies in File "/opt/conda/lib/python3.8/site-packages/detectron2/evaluation/pascal_voc_evaluation.py", here
I checked the code: I looked
insert image description here
at my xml and
insert image description here
found that I didn't have the pose Item
So delete the line obj_struct[“pose”] = obj.find(“pose”) in pascal_voc_evaluation.py

问题②:ValueError: invalid literal for int() with base 10: ‘1260.29’

insert image description here
Change the four ints of pascal_voc_evaluation.py to float

问题③:module ‘numpy’ has no attribute ‘bool’.

Just upgrade numpy from 1.22.x to 1.23.1
pip

Guess you like

Origin blog.csdn.net/Qingyou__/article/details/130008296