Article directory
This post is based on the environment of DiffusionDet and detectron2 and has been configured (you can run through the demo.py of DiffusionDet).
I did not follow the official procedure to establish a soft link or something. It is troublesome. I just built the directory according to my own habits.
DiffusionDet code link: https://github.com/ShoufaChen/DiffusionDet
detectron2 code link: https://github.com/facebookresearch/detectron2
1. Data format
① Directory format
②Annotations: xml file
③ImageSets/Main/under: train.txt and val.txt
train.txt:
val.txt:
④ Under JPEGImages: pictures
2. Download the pre-trained model
https://github.com/ShoufaChen/DiffusionDet
The COCO-Res50 I downloaded is placed in DiffusionDet-main/model/ (built by myself)
3. Modify the code
This part is a huge trouble, I am convinced by detectron2, (solve the problem of Fix for numpy deprecation of np.str https://github.com/facebookresearch/detectron2/pull/4806 ).
If there are any other good methods, I will change them later.
① Modify the configuration file diffdet.coco.res50.yaml
DiffusionDet-main/configs/下的diffdet.coco.res50.yaml
- The red box must be changed
- The full path I wrote for the WEIGHTS parameter (no mistake)
- Change NUM_CLASSES according to your own data set category
- Do not move the voc_2007_train and voc_2007_val of line11 and line12, it represents the data format, not your file name
② Modify the configuration file Base-DiffusionDet.yaml
Modify this batch when the video memory bursts later, and adjust it to a smaller size.
③ Modify pascal_voc.py in detectron2
This step is because after modifying the configuration file according to the above, in fact, it should be able to run, but python train_net.py --config-file configs/diffdet.coco.res50.yaml
there is a problem during runtime.
File "/opt/conda/lib/python3.8/site-packages/detectron2/data/datasets/pascal_voc.py", line 35, in load_voc_instances
fileids = np.loadtxt(f, dtype=np.str)
File "/opt/conda/lib/python3.8/site-packages/numpy/__init__.py", line 305, in __getattr__
raise AttributeError(__former_attrs__[attr])
AttributeError: module 'numpy' has no attribute 'str'.
`np.str` was a deprecated alias for the builtin `str`. To avoid this error in existing code, use `str` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.str_` here.
The aliases was originally deprecated in NumPy 1.20; for more details and guidance see the original release note at:
https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
Let me talk about all the solutions first,
the first one: modify the numpy version to 1.20.1 (recommended)
the second one: violently change the code: need to modify /opt/conda/lib/python3.8/site-packages/detectron2/data/ datasets/pascal_voc.py file, let’s talk step by step:
open /opt/conda/lib/python3.8/site-packages/detectron2/data/datasets/pascal_voc.py file:
vim /opt/conda/lib/python3.8/site-packages/detectron2/data/datasets/pascal_voc.py
Four places need to be modified.
After the modification, python train_net.py --config-file configs/diffdet.coco.res50.yaml
it can be run.
Detailed solution steps:
I see this problem in the GitHub issue of Detectron2:
https://github.com/facebookresearch/detectron2/pull/4806
At first I thought that when I was installing detectron2, I should pull the code and compile it. Later, I found out that this is not the case. The new code is also the same. The details are as follows: But I didn’t see the specific
solution. I saw the latest version of the code. This problem also exists in detectron2/detectron2/data/datasets/pascal_voc.py.
Modification method:
1. Tried to modify the numpy version, but later thought it was better to change the code of detectron2 directly.
2. Since it is a problem in /opt/conda/lib/python3.8/site-packages/detectron2/data/datasets/pascal_voc.py, then modify this file.
Run the following command:
vim /opt/conda/lib/python3.8/site-packages/detectron2/data/datasets/pascal_voc.py
The original version is not copied here, I will directly mark where I changed it:
(my modification method is more violent, and directly modify the source code when encountering problems) Later, new problems
appeared :
DatasetCatalog.register(name, lambda: load_voc_instances(dirname, split, class_names))
File "/opt/conda/lib/python3.8/site-packages/detectron2/data/datasets/pascal_voc.py", line 41, in load_voc_instances
anno_file = os.path.join(annotation_dirname, fileid + ".xml")
numpy.core._exceptions.UFuncTypeError: ufunc 'add' did not contain a loop with signature matching types (dtype('<U13'), dtype('<U4')) -> None
Solution:
Or modify the /opt/conda/lib/python3.8/site-packages/detectron2/data/datasets/pascal_voc.py file and
run the following command:
vim /opt/conda/lib/python3.8/site-packages/detectron2/data/datasets/pascal_voc.py
Modifying the code: a new problem
appeared later :
File "/opt/conda/lib/python3.8/site-packages/iopath/common/file_io.py", line 604, in _open
return open( # type: ignore
FileNotFoundError: [Errno 2] No such file or directory: "datasets/VOC2007/Annotations/['7_hunse_left' '(28)'].xml"
This is equivalent to all the file paths have been spliced, but due to the version or the code modification just now, the path I spliced has [ ] ' ' characters. My original file name was 7_hunse_left(28).xml, and it turned out to be ['7_hunse_left' '(28)'].xml, so the
next step is to modify the code and delete these useless characters in the path.
Still modify the /opt/conda/lib/python3.8/site-packages/detectron2/data/datasets/pascal_voc.py file and
run the following command:
vim /opt/conda/lib/python3.8/site-packages/detectron2/data/datasets/pascal_voc.py
Modify the code:
rep=[('[', ''), (']', ''),('\'', '')]
for c, r in rep:
if c in anno_file:
anno_file= anno_file.replace(c, r)
for c, r in rep:
if c in jpeg_file:
jpeg_file = jpeg_file.replace(c, r)
At this time, the operation python train_net.py --config-file configs/diffdet.coco.res50.yaml
is successful.
The following information is displayed:
4. Successful training
Command Line Args: Namespace(config_file='configs/diffdet.coco.res50.yaml', dist_url='tcp://127.0.0.1:49152', eval_only=False, machine_rank=0, num_gpus=1, num_machines=1, opts=[], resume=False)
[04/07 03:14:31 detectron2]: Rank of current process: 0. World size: 1
[04/07 03:14:33 detectron2]: Environment info:
---------------------- ---------------------------------------------------------
sys.platform linux
Python 3.8.8 (default, Feb 24 2021, 21:46:12) [GCC 7.3.0]
numpy 1.24.2
detectron2 0.6 @/opt/conda/lib/python3.8/site-packages/detectron2
Compiler GCC 7.3
CUDA compiler CUDA 11.1
detectron2 arch flags 3.7, 5.0, 5.2, 6.0, 6.1, 7.0, 7.5, 8.0, 8.6
DETECTRON2_ENV_MODULE <not set>
PyTorch 1.8.1 @/opt/conda/lib/python3.8/site-packages/torch
PyTorch debug build False
GPU available Yes
GPU 0 Tesla T4 (arch=7.5)
Driver version 470.161.03
CUDA_HOME /usr/local/cuda
Pillow 8.1.2
torchvision 0.9.1 @/opt/conda/lib/python3.8/site-packages/torchvision
torchvision arch flags 3.5, 5.0, 6.0, 7.0, 7.5, 8.0, 8.6
fvcore 0.1.5.post20221221
iopath 0.1.9
cv2 4.7.0
---------------------- ---------------------------------------------------------
PyTorch built with:
- GCC 7.3
- C++ Version: 201402
- Intel(R) Math Kernel Library Version 2020.0.2 Product Build 20200624 for Intel(R) 64 architecture applications
- Intel(R) MKL-DNN v1.7.0 (Git Hash 7aed236906b1f7a05c0917e5257a1af05e9ff683)
- OpenMP 201511 (a.k.a. OpenMP 4.5)
- NNPACK is enabled
- CPU capability usage: AVX2
- CUDA Runtime 11.1
- NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=compute_37
- CuDNN 8.0.5
- Magma 2.5.2
- Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.1, CUDNN_VERSION=8.0.5, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.8.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON,
[04/07 03:14:33 detectron2]: Command line arguments: Namespace(config_file='configs/diffdet.coco.res50.yaml', dist_url='tcp://127.0.0.1:49152', eval_only=False, machine_rank=0, num_gpus=1, num_machines=1, opts=[], resume=False)
[04/07 03:14:33 detectron2]: Contents of args.config_file=configs/diffdet.coco.res50.yaml:
_BASE_: "Base-DiffusionDet.yaml"
...(省略)
NUM_CLASSES: 1
NUM_CLS: 1
NUM_DYNAMIC: 2
VERSION: 2
VIS_PERIOD: 0
[04/07 03:14:33 detectron2]: Full config saved to ./output/config.yaml
anno_file : datasets/VOC2007/Annotations/7_hunse_left (28).xml
jpeg_file : datasets/VOC2007/JPEGImages/7_hunse_left (28).jpg
...(省略)
anno_file : datasets/VOC2007/Annotations/9_yiwu_right (9).xml
jpeg_file : datasets/VOC2007/JPEGImages/9_yiwu_right (9).jpg
[04/07 03:14:37 d2.data.build]: Using training sampler TrainingSampler
[04/07 03:14:37 d2.data.common]: Serializing 255 elements to byte tensors and concatenating them all ...
[04/07 03:14:37 d2.data.common]: Serialized dataset takes 0.12 MiB
WARNING [04/07 03:14:37 d2.solver.build]: SOLVER.STEPS contains values larger than SOLVER.MAX_ITER. These values will be ignored.
[04/07 03:14:37 fvcore.common.checkpoint]: [Checkpointer] Loading from /home/workspace_disk/DiffusionDet-main/model/diffdet_coco_res50.pth ...
WARNING [04/07 03:14:37 fvcore.common.checkpoint]: Skip loading parameter 'head.head_series.1.class_logits.bias' to the model due to incompatible shapes: (80,) in the checkpoint but (1,) in the model! You might want to double check if this is expected.
WARNING [04/07 03:14:37 fvcore.common.checkpoint]: Skip loading parameter 'head.head_series.5.class_logits.bias' to the model due to incompatible shapes: (80,) in the checkpoint but (1,) in the model! You might want to double check if this is expected.
WARNING [04/07 03:14:37 fvcore.common.checkpoint]: Some model parameters or buffers are not found in the checkpoint:
head.head_series.0.class_logits.{
bias, weight}
head.head_series.1.class_logits.{
bias, weight}
head.head_series.2.class_logits.{
bias, weight}
head.head_series.3.class_logits.{
bias, weight}
head.head_series.4.class_logits.{
bias, weight}
head.head_series.5.class_logits.{
bias, weight}
[04/07 03:14:37 d2.engine.train_loop]: Starting training from iteration 0
[04/07 03:15:09 d2.utils.events]: eta: 19:46:08 iter: 19 total_loss: 29.27 loss_ce: 2.284 loss_bbox: 0.9446 loss_giou: 1.874 loss_ce_0: 2.142 loss_bbox_0: 1 loss_giou_0: 1.886 loss_ce_1: 2.153 loss_bbox_1: 0.9317 loss_giou_1: 1.889 loss_ce_2: 1.589 loss_bbox_2: 0.8879 loss_giou_2: 1.887 loss_ce_3: 1.95 loss_bbox_3: 0.8332 loss_giou_3: 1.868 loss_ce_4: 2.502 loss_bbox_4: 0.8332 loss_giou_4: 1.867 time: 1.5567 data_time: 0.0228 lr: 7.2025e-07 max_mem: 9072M
5. Other possible problems
Question ①: CUDA out of memory
File "/opt/conda/lib/python3.8/site-packages/torch/nn/functional.py", line 2205, in layer_norm
return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)
RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 14.76 GiB total capacity; 13.67 GiB already allocated; 21.75 MiB free; 13.72 GiB reserved in total by PyTorch)
修改DiffusionDet-main/configs/Base-DiffusionDet.yaml
Question ②: tuple.index(x): x not in tuple
File "/opt/conda/lib/python3.8/site-packages/detectron2/data/datasets/pascal_voc.py", line 80, in load_voc_instances
{
"category_id": class_names.index(cls), "bbox": bbox, "bbox_mode": BoxMode.XYXY_ABS}
ValueError: tuple.index(x): x not in tuple
The reason is that the data categories are inconsistent. Modify: detectron2/data/datasets/pascal_voc.py
and change CLASS_NAMES to the correct category.
6. Evaluation
①Change DiffusionDet-main/train_net.py
I refer to several posts, and some have changed the entire build_evaluator, and found that the changes are too large, so I only changed the calling function.
②If you only have one category, be sure to change detectron2/data/datasets/pascal_voc.py
Add a comma, otherwise he will split my bad into three categories,
The following is a tuple. If there is only one element, a comma must be included, which is careless.
Under detectron2/data/datasets/pascal_voc.py:
This time it’s all right
问题①:‘NoneType’ object has no attribute ‘text’
The problem lies in File "/opt/conda/lib/python3.8/site-packages/detectron2/evaluation/pascal_voc_evaluation.py", here
I checked the code: I looked
at my xml and
found that I didn't have the pose Item
So delete the line obj_struct[“pose”] = obj.find(“pose”) in pascal_voc_evaluation.py
问题②:ValueError: invalid literal for int() with base 10: ‘1260.29’
Change the four ints of pascal_voc_evaluation.py to float
问题③:module ‘numpy’ has no attribute ‘bool’.
Just upgrade numpy from 1.22.x to 1.23.1
pip