Table of contents
Compile CUDA dependent environment
Preparation
Source code download
Faster R-CNN pytorch version 0.4.0 source code: GitHub - jwyang/faster-rcnn.pytorch: A faster pytorch implementation of faster r-cnn
Faster R-CNN pytorch version 1.0.0 source code: GitHub - jwyang/faster-rcnn.pytorch at pytorch -1.0
Configuration Environment
Use the following command to install the required library in the directory where requirements.txt is located
pip install -r requirements.txt
Note that it is best to downgrade the scipy library after this, such as installing version 1.2.1, otherwise an error may be reported later
pip uninstall scipy
pip install scipy==1.2.1
Make a VOC dataset
Make your own VOC format dataset: record the Open Image v4 dataset into VOC format_ZZZZ_Y_'s Blog-CSDN Blog
Create a soft link to point to the data set, without copying the data set to the specified path of the project to occupy additional memory: Windows, Linux create a soft link_windows create a link_ZZZZ_Y_'s blog-CSDN blog
data directory structure
data
├─VOCdevkit2007
│ └─VOC2007
│ ├─Annotations
│ ├─ImageSets
│ │ ├─Layout
│ │ ├─Main
│ │ └─Segmentation
│ ├─JPEGImages
│ ├─SegmentationClass
│ └─SegmentationObject
└─pretrained_model
The pretrained model is stored under pretrained_model ,
The xml tag file is stored under Annotations ,
The jpg image data files are stored under JPEGImages .
The Main folder under ImageSets stores the training set verification set and test set txt files, which contain the serial numbers of the pictures
train
Compile CUDA dependent environment
cd lib
python setup.py build develop
pre-trained model
The pre-trained model should be stored in the pretrained_model folder
Modify the pascal_voc.py file
Modify the detection category in the file path faster-rcnn.pytorch/lib/datasets/pascal_voc.py file
If the category name is lowercase!
to train
CUDA_VISIBLE_DEVICES=0 python trainval_net.py --dataset pascal_voc --net res101 --bs 4 --nw 4 --lr 0.005 --lr_decay_step 5 --cuda --epochs 50
test
Test with the following code:
CUDA_VISIBLE_DEVICES=9 python test_net.py --dataset pascal_voc --net res101 --checksession 1 --checkepoch 1 --checkpoint 6755 --cuda
demo.py
Run demo.py with the following command
CUDA_VISIBLE_DEVICES=0 python demo.py --net res101 --checksession 1 --checkepoch 4 --checkpoint 13512 --cuda --load_dir models
报错:RuntimeError: Error(s) in loading state_dict for resnet:
RuntimeError: Error(s) in loading state_dict for resnet:
size mismatch for RCNN_cls_score.weight: copying a param with shape torch.Size([16, 2048]) from checkpoint, the shape in current model is torch.Size([21, 2048]).
size mismatch for RCNN_cls_score.bias: copying a param with shape torch.Size([16]) from checkpoint, the shape in current model is torch.Size([21]).
size mismatch for RCNN_bbox_pred.weight: copying a param with shape torch.Size([64, 2048]) from checkpoint, the shape in current model is torch.Size([84, 2048]).
size mismatch for RCNN_bbox_pred.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([84]).
Continue to report an error:
Modify the detection category in the demo.py code:
The server resources are not enough, everyone is using it, so I can’t test (test+demo) today
problems encountered
1. Error invalid command 'develop' when compiling
usage: setup.py [global_opts] cmd1 [cmd1_opts] [cmd2 [cmd2_opts] ...]
or: setup.py --help [cmd1 cmd2 ...]
or: setup.py --help-commands
or: setup.py cmd --help
error: invalid command 'develop'
Document: python setup.py develop · Issue #92 · django-extensions/django-extensions · GitHub
in the setup.py file
from distutils.core import setup
replace with
from setuptools import setup
Then activate the virtual environment on the server terminal, cd to the corresponding directory, and execute the following commands in sequence:
conda activate your_env_name
cd lib
python setup.py build develop
Compilation started!
2. There will be can't import imread , which can be solved by downgrading the version of scipy , which can be downgraded to 1.2.1
pip uninstall scipy
pip install scipy==1.2.1
3. libstdc++.so.6: version `GLIBCXX_3.4.30' not found
ImportError: /home/zy/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/lib/../../../../libstdc++.so.6: version `GLIBCXX_3.4.30' not found (required by /home/zy/anaconda3/envs/pytorch/lib/python3.6/site-packages/cv2/cv2.abi3.so)
Check the GLIBCXX version supported in the system libstdc++.so.6 file:
strings /usr/lib/x86_64-linux-gnu/libstdc++.so.6 | grep GLIBC
As shown in the figure below, the highest version is 3.4.30
The GLIBCXX version supported in the libstdc++.so.6 file in the anaconda environment:
strings /home/zy/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/lib/../../../../libstdc++.so.6 | grep GLIBCX
The highest version in the anaconda environment is 3.4.29, but the required version is 3.4.30
View the related files of libstdc++.so.6 in the anaconda environment:
ls libstdc++.so
ls libstdc++.so -al
ls libstdc++.so.6 -al
ls libstdc++.so.6.0.29 -al
Use the following command to view the related files of libstdc++.so.6 under the system library path:
ls -al /usr/lib/x86_64-linux-gnu/libstdc++.so.6
At present, the link address of libstdc++.so and libstdc++.so.6 in the anaconda environment points to libstdc++.so.6.0.29
Use the following command to point the link address of libstdc++.so and libstdc++.so.6 in the anaconda environment to the address in the system path
rm libstdc++.so
rm libstdc++.so.6
ln -s /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.30 libstdc++.so
ln -s /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.30 libstdc++.so.6
Check again to find that the version of the link is 6.0.30
You can also try this, I did not try the following after I succeeded with the above method:
4. Run again, the last error was solved, and a new problem appeared
ImportError: torch.utils.ffi is deprecated. Please use cpp extensions instead.
Reference solution: [Solved] ImportError: torch.utils.ffi is deprecated. Please use cpp extensions instead._ShuqiaoS's Blog-CSDN Blog
I found that I was using the 0.4.0 code, so there were so many problems, and now I have to change to the 1.0.1 code. . . Really sad, toss for a long time to find that the version is 0.4.0
Faster R-CNN pytorch version 0.4.0 source code: GitHub - jwyang/faster-rcnn.pytorch: A faster pytorch implementation of faster r-cnn
Faster R-CNN pytorch version 1.0.0 source code: GitHub - jwyang/faster-rcnn.pytorch at pytorch -1.0
For 0.4.0, you can see this, (2 messages) Faster RCNN environment configuration_faster rcnn environment configuration_Our blog-CSDN blog , I think this is still using 0.4.0
5. Running error ImportError: cannot import name '_mask'
参考ImportError: cannot import name '_mask' · Issue #410 · jwyang/faster-rcnn.pytorch · GitHub
Activate the virtual environment, cd to the data directory to install coco API, execute the following command
cd data
git clone https://github.com/pdollar/coco.git
cd coco/PythonAPI
make
6. New problem: TypeError: load() missing 1 required positional argument: 'Loader'
Reason: The new version of ppyaml does not support the old version of yaml.load() ,
way1: It can be replaced in the following three ways:
yaml.load(file,Loader=yaml.FullLoader)
yaml.safe_load(file)
yaml.load(file, Loader=yaml.CLoader)
way2: Downgrade pyyaml version 6.0 to 5.4.1 (I solved it in this way, and it feels the most convenient)
pip uninstall pyyaml
pip install pyyaml==5.4.1
Finally no error, moved
7. oh no, there is an error
ValueError: Caught ValueError in DataLoader worker process 1.
ValueError: operands could not be broadcast together with shapes (683,1024,4) (1,1,3) (683,1024,4)
The multiple graphics cards I used before are now replaced with one graphics card, and no error is reported, but why are rpn_cls, rpn_box, etc. all nan?
Well, I still reported an error, the same as above, and found that the names of my dataset files are all 5 digits, it should be 6 digits
8. Report an error after modification assert (boxes[:, 2] >= boxes[:, 0]).all() AssertionError
Modify lib/datasets/pascal_voc.py , _load_pascal_annotation(,) function
Remove all -1 after Xmin, Ymin, Xmax, and Ymax
Modify lib/datasets/imdb.py , append_flipped_images() function
Data sorting, add the code under a line of code boxes[:, 2] = widths[i] - oldx1 - 1 :
aboxes = boxes
for b in range(len(boxes)):
if boxes[b][2] < boxes[b][0]:
boxes[b][0] = boxes[b][2]
boxes[b][2] = aboxes[b][0]
9. This error occurred again when running
roidb[i]['img_id'] = imdb.image_id_at(i)
IndexError: list index out of range
Reference roidb[i]['image'] = imdb.image_path_at(i) Issue #79 RBGIRSHICK/FAST-RCNN GitHub
This may be caused by the cache file, you can delete the specific cache file of the training data under the fast-rcnn-master/data/cache/ folder, and then try again to solve it!
10. Error ValueError: operands could not be broadcast together with shapes (1024,717,4) (1,1,3) (1024,717,4) It seems that this problem is still not resolved
It may be because some pictures have 4 channels, that is, rgb+alpha, and only select the three channels of rgb.
Insert before line P39 of lib\model\util\blob.py:
if im.shape[2] == 4:
im = im[:, :, :3]
You can train normally
CUDA_VISIBLE_DEVICES=3,4 python trainval_net.py --dataset pascal_voc --net res101 --bs 4 --nw 4 --lr 0.005 --lr_decay_step 5 --cuda --epochs 50
11. Another error was reported
RuntimeError: Caught RuntimeError in DataLoader worker process 3.
RuntimeError: The expanded size of the tensor (1200) must match the existing size (0) at non-singleton dimension 1. Target sizes: [600, 1200, 3]. Tensor sizes: [600, 0, 3]
1439
[session 1][epoch 1][iter 3800/6756] loss: 0.7875, lr: 5.00e-03
fg/bg=(10/1014), time cost: 53.474536
rpn_cls: 0.1268, rpn_box: 0.0292, rcnn_cls: 0.0855, rcnn_box 0.0142
Traceback (most recent call last):
File "trainval_net.py", line 310, in <module>
data = next(data_iter)
File "/home/zy/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 435, in __next__
data = self._next_data()
File "/home/zy/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1085, in _next_data
return self._process_data(data)
File "/home/zy/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1111, in _process_data
data.reraise()
File "/home/zy/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/_utils.py", line 428, in reraise
raise self.exc_type(msg)
RuntimeError: Caught RuntimeError in DataLoader worker process 2.
Original Traceback (most recent call last):
File "/home/zy/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/utils/data/_utils/worker.py", line 198, in _worker_loop
data = fetcher.fetch(index)
File "/home/zy/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/zy/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/zy/faster-rcnn.pytorch-pytorch-1.0/lib/roi_data_layer/roibatchLoader.py", line 177, in __getitem__
padding_data[:, :data_width, :] = data[0]
RuntimeError: The expanded size of the tensor (1200) must match the existing size (0) at non-singleton dimension 1. Target sizes: [600, 1200, 3]. Tensor sizes: [600, 0, 3]
Try deleting the pkl file under fast-rcnn-master/data/cache/
The test process reports an error:
1. AttributeError: 'NoneType' object has no attribute 'text'
File "/home/zy/faster-rcnn.pytorch-pytorch-1.0/lib/datasets/voc_eval.py", line 22, in parse_rec
obj_struct['pose'] = obj.find('pose').text
AttributeError: 'NoneType' object has no attribute 'text'
Solution: Go to the original document and comment out this sentence obj_struct['pose'] = obj.find('pose').text
I found that pose, truncated, and difficult are not in my xml file, so I commented them all
2. Error: KeyError: 'difficult'
difficult = np.array([x['difficult'] for x in R]).astype(np.bool)
KeyError: 'difficult'
Solution: Modify the faster-rcnn.pytorch-pytorch-1.0/lib/datasets/voc_eval.py file and comment out the difficult related code
Refer to using py-faster-rcnn to train the target detection model_liuyan20062010's blog-CSDN blog
Modified places:
def parse_rec(filename):
""" Parse a PASCAL VOC xml file """
tree = ET.parse(filename)
objects = []
for obj in tree.findall('object'):
obj_struct = {}
obj_struct['name'] = obj.find('name').text
#obj_struct['pose'] = obj.find('pose').text //注释这一行
#obj_struct['truncated'] = int(obj.find('truncated').text) //注释这一行
#obj_struct['difficult'] = int(obj.find('difficult').text) //注释这一行
bbox = obj.find('bndbox')
obj_struct['bbox'] = [int(bbox.find('xmin').text),
int(bbox.find('ymin').text),
int(bbox.find('xmax').text),
int(bbox.find('ymax').text)]
objects.append(obj_struct)
return objects
# extract gt objects for this class
class_recs = {}
npos = 0
for imagename in imagenames:
R = [obj for obj in recs[imagename] if obj['name'] == classname]
bbox = np.array([x['bbox'] for x in R])
#difficult = np.array([x['difficult'] for x in R]).astype(np.bool) //注释这行
difficult = 0; //添加这行
det = [False] * len(R)
#npos = npos + sum(~difficult) //注释这行
class_recs[imagename] = {'bbox': bbox,
'difficult': difficult,
'det': det}
if ovmax > ovthresh:
#if not R['difficult'][jmax]: //注释这行
# if not R['det'][jmax]: //注释这行
# tp[d] = 1. //注释这行
# R['det'][jmax] = 1 //注释这行
# else: //注释这行
fp[d] = 1.
else:
fp[d] = 1.
main reference article
Faster RCNN (Pytorch) configuration process records and problem solving_cc__cc__'s blog-CSDN blog
(linux) Faster RCNN-pytorch1.0 target detection 2: Training your own data set, gpu, pycharm, training notes Version)_Lulu is not the blog of chubby paperer-CSDN Blog