Use Faster R-CNN (linux-pytorch) to train your own data set process record

Table of contents

Preparation 

Source code download

Configuration Environment

Make a VOC dataset

data directory structure 

train

Compile CUDA dependent environment

pre-trained model

Modify the pascal_voc.py file

to train

problems encountered

main reference article


Preparation 

Source code download

Faster R-CNN pytorch version 0.4.0 source code: GitHub - jwyang/faster-rcnn.pytorch: A faster pytorch implementation of faster r-cnn
Faster R-CNN pytorch version 1.0.0 source code: GitHub - jwyang/faster-rcnn.pytorch at pytorch -1.0

Configuration Environment

Use the following command to install the required library in the directory where requirements.txt is located

pip install -r requirements.txt

Note that it is best to downgrade the scipy library after this, such as installing version 1.2.1, otherwise an error may be reported later

pip uninstall scipy
pip install scipy==1.2.1

Make a VOC dataset

Make your own VOC format dataset: record the Open Image v4 dataset into VOC format_ZZZZ_Y_'s Blog-CSDN Blog

Create a soft link to point to the data set, without copying the data set to the specified path of the project to occupy additional memory: Windows, Linux create a soft link_windows create a link_ZZZZ_Y_'s blog-CSDN blog

data directory structure 

data
├─VOCdevkit2007
│  └─VOC2007
│      ├─Annotations
│      ├─ImageSets
│      │  ├─Layout
│      │  ├─Main
│      │  └─Segmentation
│      ├─JPEGImages
│      ├─SegmentationClass
│      └─SegmentationObject
└─pretrained_model

The pretrained model is stored under pretrained_model ,

The xml tag file is stored under Annotations ,

The jpg image data files are stored under JPEGImages .

The Main folder under ImageSets stores the training set verification set and test set txt files, which contain the serial numbers of the pictures

train

Compile CUDA dependent environment

cd lib
python setup.py build develop

pre-trained model

The pre-trained model should be stored in the pretrained_model folder

Modify the pascal_voc.py file

Modify the detection category in  the file path faster-rcnn.pytorch/lib/datasets/pascal_voc.py file

If the category name is lowercase!

to train

CUDA_VISIBLE_DEVICES=0 python trainval_net.py --dataset pascal_voc --net res101 --bs 4 --nw 4 --lr 0.005 --lr_decay_step 5 --cuda --epochs 50

test

Test with the following code:

CUDA_VISIBLE_DEVICES=9 python test_net.py --dataset pascal_voc --net res101 --checksession 1 --checkepoch 1 --checkpoint 6755 --cuda

demo.py

Run demo.py with the following command

CUDA_VISIBLE_DEVICES=0 python demo.py --net res101 --checksession 1 --checkepoch 4 --checkpoint 13512 --cuda --load_dir models

报错:RuntimeError: Error(s) in loading state_dict for resnet:

RuntimeError: Error(s) in loading state_dict for resnet:
        size mismatch for RCNN_cls_score.weight: copying a param with shape torch.Size([16, 2048]) from checkpoint, the shape in current model is torch.Size([21, 2048]).
        size mismatch for RCNN_cls_score.bias: copying a param with shape torch.Size([16]) from checkpoint, the shape in current model is torch.Size([21]).
        size mismatch for RCNN_bbox_pred.weight: copying a param with shape torch.Size([64, 2048]) from checkpoint, the shape in current model is torch.Size([84, 2048]).
        size mismatch for RCNN_bbox_pred.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([84]).

Continue to report an error:

Modify the detection category in the demo.py code:

 The server resources are not enough, everyone is using it, so I can’t test (test+demo) today

problems encountered

1. Error invalid command 'develop' when compiling

usage: setup.py [global_opts] cmd1 [cmd1_opts] [cmd2 [cmd2_opts] ...]
   or: setup.py --help [cmd1 cmd2 ...]
   or: setup.py --help-commands
   or: setup.py cmd --help

error: invalid command 'develop'

Document: python setup.py develop · Issue #92 · django-extensions/django-extensions · GitHub

in the setup.py file 

from distutils.core import setup

replace with

from setuptools import setup

Then activate the virtual environment on the server terminal, cd to the corresponding directory, and execute the following commands in sequence:

conda activate your_env_name
cd lib
python setup.py build develop

 Compilation started!

2. There will be can't import imread , which can be solved by downgrading the version of scipy , which can be downgraded to 1.2.1

pip uninstall scipy
pip install scipy==1.2.1

3. libstdc++.so.6: version `GLIBCXX_3.4.30' not found

ImportError: /home/zy/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/lib/../../../../libstdc++.so.6: version `GLIBCXX_3.4.30' not found (required by /home/zy/anaconda3/envs/pytorch/lib/python3.6/site-packages/cv2/cv2.abi3.so)

Check the GLIBCXX version supported in the system libstdc++.so.6 file:

strings  /usr/lib/x86_64-linux-gnu/libstdc++.so.6   | grep GLIBC

As shown in the figure below, the highest version is 3.4.30 

The GLIBCXX version supported in the libstdc++.so.6 file in the anaconda environment:

strings  /home/zy/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/lib/../../../../libstdc++.so.6 | grep GLIBCX

The highest version in the anaconda environment is 3.4.29, but the required version is 3.4.30 

 Refer to Ubuntu system anaconda error version `GLIBCXX_3.4.30' not found - Death_Knight - Blog Garden (cnblogs.com)

View the related files of libstdc++.so.6 in the anaconda environment:

ls libstdc++.so
ls libstdc++.so -al
ls libstdc++.so.6 -al
ls libstdc++.so.6.0.29 -al

 Use the following command to view the related files of libstdc++.so.6 under the system library path:

ls -al /usr/lib/x86_64-linux-gnu/libstdc++.so.6

At present, the link address of libstdc++.so and libstdc++.so.6 in the anaconda environment points to libstdc++.so.6.0.29

 Use the following command to point the link address of libstdc++.so and libstdc++.so.6 in the anaconda environment to the address in the system path

rm libstdc++.so
rm libstdc++.so.6
ln -s /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.30 libstdc++.so
ln -s /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.30 libstdc++.so.6

 Check again to find that the version of the link is 6.0.30

You can also try this, I did not try the following after I succeeded with the above method:

(Solved) Import error Version `GLIBCXX_3.4.22' not found_glibcxx_3.4.28_Cocoa and Fish Blog-CSDN Blog

4. Run again, the last error was solved, and a new problem appeared

ImportError: torch.utils.ffi is deprecated. Please use cpp extensions instead.

 Reference solution: [Solved] ImportError: torch.utils.ffi is deprecated. Please use cpp extensions instead._ShuqiaoS's Blog-CSDN Blog

I found that I was using the 0.4.0 code, so there were so many problems, and now I have to change to the 1.0.1 code. . . Really sad, toss for a long time to find that the version is 0.4.0

Faster R-CNN pytorch version 0.4.0 source code: GitHub - jwyang/faster-rcnn.pytorch: A faster pytorch implementation of faster r-cnn
Faster R-CNN pytorch version 1.0.0 source code: GitHub - jwyang/faster-rcnn.pytorch at pytorch -1.0

For 0.4.0, you can see this, (2 messages) Faster RCNN environment configuration_faster rcnn environment configuration_Our blog-CSDN blog , I think this is still using 0.4.0

5. Running error ImportError: cannot import name '_mask'

参考ImportError: cannot import name '_mask' · Issue #410 · jwyang/faster-rcnn.pytorch · GitHub

Activate the virtual environment, cd to the data directory to install coco API, execute the following command

cd data

git clone https://github.com/pdollar/coco.git

cd coco/PythonAPI

make

6. New problem: TypeError: load() missing 1 required positional argument: 'Loader'

Reason: The new version of ppyaml does not support the old version of yaml.load() ,

way1: It can be replaced in the following three ways:

yaml.load(file,Loader=yaml.FullLoader)
yaml.safe_load(file)
yaml.load(file, Loader=yaml.CLoader)

way2: Downgrade pyyaml ​​version 6.0 to 5.4.1 (I solved it in this way, and it feels the most convenient)

pip uninstall pyyaml
pip install pyyaml==5.4.1

Finally no error, moved

7. oh no, there is an error

ValueError: Caught ValueError in DataLoader worker process 1.
ValueError: operands could not be broadcast together with shapes (683,1024,4) (1,1,3) (683,1024,4)

The multiple graphics cards I used before are now replaced with one graphics card, and no error is reported, but why are rpn_cls, rpn_box, etc. all nan?

Well, I still reported an error, the same as above, and found that the names of my dataset files are all 5 digits, it should be 6 digits

8. Report an error after modification assert (boxes[:, 2] >= boxes[:, 0]).all() AssertionError

Reference (linux) Faster RCNN-pytorch1.0 target detection 2: training your own data set, gpu, pycharm, training notes_chao_xy's blog-CSDN blog

Modify lib/datasets/pascal_voc.py , _load_pascal_annotation(,) function

Remove all -1 after Xmin, Ymin, Xmax, and Ymax

 Modify lib/datasets/imdb.py , append_flipped_images() function

Data sorting, add the code under a line of code boxes[:, 2] = widths[i] - oldx1 - 1 :

aboxes = boxes
for b in range(len(boxes)):
if boxes[b][2] < boxes[b][0]:
    boxes[b][0] = boxes[b][2]
    boxes[b][2] = aboxes[b][0]

9. This error occurred again when running 

roidb[i]['img_id'] = imdb.image_id_at(i) 
IndexError: list index out of range

Reference roidb[i]['image'] = imdb.image_path_at(i) Issue #79 RBGIRSHICK/FAST-RCNN GitHub

This may be caused by the cache file, you can delete the specific cache file of the training data under the fast-rcnn-master/data/cache/ folder, and then try again to solve it!

10. Error ValueError: operands could not be broadcast together with shapes (1024,717,4) (1,1,3) (1024,717,4) It seems that this problem is still not resolved

 It may be because some pictures have 4 channels, that is, rgb+alpha, and only select the three channels of rgb.

参考ValueError: operands could not be broadcast together with shapes (441,786,4) (1,1,3) (441,786,4) · Issue #599 · jwyang/faster-rcnn.pytorch · GitHub

Insert before line P39 of lib\model\util\blob.py:

 if im.shape[2] == 4:
     im = im[:, :, :3]

You can train normally

CUDA_VISIBLE_DEVICES=3,4 python trainval_net.py --dataset pascal_voc --net res101 --bs 4 --nw 4 --lr 0.005 --lr_decay_step 5 --cuda --epochs 50

11. Another error was reported

RuntimeError: Caught RuntimeError in DataLoader worker process 3.

RuntimeError: The expanded size of the tensor (1200) must match the existing size (0) at non-singleton dimension 1.  Target sizes: [600, 1200, 3].  Tensor sizes: [600, 0, 3]

1439
[session 1][epoch  1][iter 3800/6756] loss: 0.7875, lr: 5.00e-03
                        fg/bg=(10/1014), time cost: 53.474536
                        rpn_cls: 0.1268, rpn_box: 0.0292, rcnn_cls: 0.0855, rcnn_box 0.0142
Traceback (most recent call last):
  File "trainval_net.py", line 310, in <module>
    data = next(data_iter)
  File "/home/zy/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 435, in __next__
    data = self._next_data()
  File "/home/zy/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1085, in _next_data
    return self._process_data(data)
  File "/home/zy/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1111, in _process_data
    data.reraise()
  File "/home/zy/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/_utils.py", line 428, in reraise
    raise self.exc_type(msg)
RuntimeError: Caught RuntimeError in DataLoader worker process 2.
Original Traceback (most recent call last):
  File "/home/zy/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/utils/data/_utils/worker.py", line 198, in _worker_loop
    data = fetcher.fetch(index)
  File "/home/zy/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/zy/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/zy/faster-rcnn.pytorch-pytorch-1.0/lib/roi_data_layer/roibatchLoader.py", line 177, in __getitem__
    padding_data[:, :data_width, :] = data[0]
RuntimeError: The expanded size of the tensor (1200) must match the existing size (0) at non-singleton dimension 1.  Target sizes: [600, 1200, 3].  Tensor sizes: [600, 0, 3]


参考"RuntimeError: The expanded size of the tensor (1200) must match the existing size (1199) at non-singleton dimension 1. Target sizes: [600, 1200, 3]. Tensor sizes: [600, 1199, 3] " · Issue #629 · jwyang/faster-rcnn.pytorch · GitHub

Try deleting the pkl file under fast-rcnn-master/data/cache/

The test process reports an error:  

1.  AttributeError: 'NoneType' object has no attribute 'text'

  File "/home/zy/faster-rcnn.pytorch-pytorch-1.0/lib/datasets/voc_eval.py", line 22, in parse_rec
    obj_struct['pose'] = obj.find('pose').text
AttributeError: 'NoneType' object has no attribute 'text'

 Solution: Go to the original document and comment out this sentence obj_struct['pose'] = obj.find('pose').text

I found that pose, truncated, and difficult are not in my xml file, so I commented them all

 参考AttributeError:“NoneType ” object has no attribute 'text' · Issue #580 · rbgirshick/py-faster-rcnn · GitHub

2. Error: KeyError: 'difficult'

difficult = np.array([x['difficult'] for x in R]).astype(np.bool)
KeyError: 'difficult'

Solution: Modify the faster-rcnn.pytorch-pytorch-1.0/lib/datasets/voc_eval.py file and comment out the difficult related code

Refer to using py-faster-rcnn to train the target detection model_liuyan20062010's blog-CSDN blog

AttributeError:“NoneType ” object has no attribute 'text' · Issue #580 · rbgirshick/py-faster-rcnn · GitHub 

Modified places: 

def parse_rec(filename):
  """ Parse a PASCAL VOC xml file """
  tree = ET.parse(filename)
  objects = []
  for obj in tree.findall('object'):
    obj_struct = {}
    obj_struct['name'] = obj.find('name').text
    #obj_struct['pose'] = obj.find('pose').text    //注释这一行
    #obj_struct['truncated'] = int(obj.find('truncated').text)    //注释这一行
    #obj_struct['difficult'] = int(obj.find('difficult').text)    //注释这一行
    bbox = obj.find('bndbox')
    obj_struct['bbox'] = [int(bbox.find('xmin').text),
                          int(bbox.find('ymin').text),
                          int(bbox.find('xmax').text),
                          int(bbox.find('ymax').text)]
    objects.append(obj_struct)

  return objects
  # extract gt objects for this class
  class_recs = {}
  npos = 0
  for imagename in imagenames:
    R = [obj for obj in recs[imagename] if obj['name'] == classname]
    bbox = np.array([x['bbox'] for x in R])
    #difficult = np.array([x['difficult'] for x in R]).astype(np.bool)    //注释这行
    difficult = 0;    //添加这行
    det = [False] * len(R)     
    #npos = npos + sum(~difficult)     //注释这行
    class_recs[imagename] = {'bbox': bbox,
                             'difficult': difficult,
                             'det': det}
      if ovmax > ovthresh:
        #if not R['difficult'][jmax]:    //注释这行
         # if not R['det'][jmax]:    //注释这行
          #  tp[d] = 1.    //注释这行
           # R['det'][jmax] = 1    //注释这行
         # else:    //注释这行
            fp[d] = 1.
      else:
        fp[d] = 1.

main reference article

Detailed explanation of the construction and use process of Faster-RCNN.pytorch (adapted to PyTorch 1.0 and above)_faster rcnn pytorch_Yale Mandala's Blog-CSDN Blog

Use faster-rcnn.pytorch to train your own data set (full version) - Wind Chaser - Blog Park (cnblogs.com)

Faster RCNN (Pytorch) configuration process records and problem solving_cc__cc__'s blog-CSDN blog

(linux) Faster RCNN-pytorch1.0 target detection 2: Training your own data set, gpu, pycharm, training  notes Version)_Lulu is not the blog of chubby paperer-CSDN Blog

Guess you like

Origin blog.csdn.net/ZZZZ_Y_/article/details/129572794