paper
Faster ILOD: Incremental learning for object detectors based on faster RCNN 2020
Paper: https://arxiv.org/abs/2003.03901
Code: https://github.com/CanPeng123/Faster-ILOD
the code
一、Requirements:
-
PyTorch 1.0 from a nightly release. It will not work with 1.0 nor
1.0.1. Installation instructions can be found in https://pytorch.org/get-started/locally/ -
torchvision from master
-
cocoapi
-
yacs
-
matplotlib
-
GCC >= 4.9
-
OpenCV
-
CUDA >= 9.0
2. Install Step-by-step installation
# first, make sure that your conda is setup properly with the right environment
# for that, check that `which conda`, `which pip` and `which python` points to the
# right path. From a clean conda env, this is what you need to do
conda create --name maskrcnn_benchmark -y
conda activate maskrcnn_benchmark
# this installs the right pip and dependencies for the fresh python
conda install ipython pip
# maskrcnn_benchmark and coco api dependencies
pip install ninja yacs cython matplotlib tqdm opencv-python
# follow PyTorch installation in https://pytorch.org/get-started/locally/
# we give the instructions for CUDA 9.0
conda install -c pytorch pytorch-nightly torchvision cudatoolkit=9.0
export INSTALL_DIR=$PWD
# install pycocotools
cd $INSTALL_DIR
git clone https://github.com/cocodataset/cocoapi.git
cd cocoapi/PythonAPI
python setup.py build_ext install
# install cityscapesScripts
cd $INSTALL_DIR
git clone https://github.com/mcordts/cityscapesScripts.git
cd cityscapesScripts/
python setup.py build_ext install
# install apex
cd $INSTALL_DIR
git clone https://github.com/NVIDIA/apex.git
cd apex
python setup.py install --cuda_ext --cpp_ext
# install PyTorch Detection
cd $INSTALL_DIR
git clone https://github.com/facebookresearch/maskrcnn-benchmark.git
cd maskrcnn-benchmark
# the following will install the lib with
# symbolic links, so that you can modify
# the files if you want and won't need to
# re-build it
python setup.py build develop
unset INSTALL_DIR
# or if you are on macOS
# MACOSX_DEPLOYMENT_TARGET=10.9 CC=clang CXX=clang++ python setup.py build develop
3. Faster-ILOD
After installing the maskrcnn environment, overwrite the Faster-ILOD related codes into the maskrcnn related folders, run python setup.py build develop to recompile or download the Faster-ILOD codes directly.
4. Run Faster-ILOD
Take 15+5 as an example:
1. Modify the dataset path
Modify Faster-ILOD/maskrcnn_benchmark/config/paths_catalog.py to find the path corresponding to voc and change it to your own.
2. Modify the configuration file
/configs/e2e_faster_rcnn_R_50_C4_1x.yaml
can modify various parameters according to requirements, and this file will not be modified for now.
3. Training the basic network
After running python tools/train_first_step.py --config-file="./configs/e2e_faster_rcnn_R_50_C4_1x.yaml"
successfully, you can view the training output /home/incremental_learning_ResNet50_C4/RPN_15_classes_40k_steps
in .
4. Incremental training
(1) Modify e2e_faster_rcnn_R_50_C4_1x_Source_model.yaml and e2e_faster_rcnn_R_50_C4_1x_Target_model.yaml, and modify the categories, new categories, old categories in the file, the path of the final model trained in the previous stage, and the output path accordingly. Run python tools/train_incremental.py
to get the final training result in the corresponding output file.
- e2e_faster_rcnn_R_50_C4_1x_Source_model.yaml
- e2e_faster_rcnn_R_50_C4_1x_Target_model.yaml
- tools/train_incremental.py
source_model_config_file = "/home/chenfang/maskrcnn-benchmark/configs/e2e_faster_rcnn_R_50_C4_1x_Source_model.yaml"
target_model_config_file = "/home/chenfang/maskrcnn-benchmark/configs/e2e_faster_rcnn_R_50_C4_1x_Target_model.yaml"
Five, encounter problems
1. git clone cannot be downloaded because of network problems
The file can be downloaded to the local and uploaded to the server;
the installation package can also be downloaded to the local and uploaded to the server, pip install file path to install
2.RuntimeError: Error compiling objects for extension
The pytorch version is inappropriate.
After
looking at the solution, I downgraded the pytorch version to 1.5 successfully.
CUDA 10.1
Pytorch 1.4.0
torchvision 0.5.0
For more solutions, please refer to https://github.com/facebookresearch/maskrcnn-benchmark/issues/1236
3.RuntimeError: Output 0 of UnbindBackward is a view and its base or another view of its base has been modified inplace.
RuntimeError: Output 0 of UnbindBackward is a view and its base or another view of its base has been modified inplace. This view is the output of a function that returns multiple views. Such functions do not allow the output views to be modified inplace. You should replace the inplace operation by an out-of-place one.
Reference: https://blog.csdn.net/Ginomica_xyx/article/details/120491859
It is known that the cause of the problem is that self.bbox has been modified multiple times. When it is modified for the second time, python does not know whether to operate the original self.bbox or the modified self.bbox.
Know the problem and try to solve the problem: modify the code to copy self.bbox to a parameter and then operate on this parameter (not allowed); deep copy is not allowed either.
Check related issues, in the final analysis, it is a bug of pytorch1.7.0.
Downgrade the pytorch version to 1.6.0 to solve this problem
4.unable to execute ‘usr/local/cuda-10.0/bin/nvcc‘: No such file or directory
Reference:
How to view and modify the PATH environment variable in linuxhttps://blog.csdn.net/qq_41251963/article/details/110120386
https://blog.csdn.net/tailonh/article/details/120322932
https://blog.csdn.net/G_inkk/article/details/124584873
5. error: cannot call member function ‘void std::basic_string<_CharT, _Traits, _Alloc>::
python setup.py build develop recompilation is an error
RuntimeError: Error compiling objects for extension, the cause of the error is
/usr/include/c++/7/bits/basic_string.tcc:1067:16: error: cannot call member function ' void std::basic_string<_CharT, _Traits, _Alloc>::_Rep
solution
Reference: https://blog.csdn.net/weixin_45328592/article/details/114646355
https://blog.csdn.net/qq_29695701/article/details/118548238
sudo gedit /usr/include/c++/7/bits/basic_string.tcc
Will
__p->_M_set_sharable()
changed to
(*__p)._M_set_sharable()
That's it.
If there is a problem with modifying the file:
'readonly' option is set (add ! to override)
the current user does not have permission, first sudo -i to switch to root permission and then modify it Directly use sudo vim to open the file for modification
Reference
https://blog.csdn.net/cheng_feng_xiao_zhan/article/details/53391474
RuntimeError: Error compiling objects for extension
may also be caused by the following:
Solution: There is an extra colon in the error path, indicating that there is a problem with the setting of the environment variable
sudo vim ~/.bashrc
export CUDA_HOME=$CUDA_HOME:/usr/local/cuda
改的
export CUDA_HOME=/usr/local/cuda
source ~/.bashrc
Reference:
https://blog.csdn.net/loovelj/article/details/110490986
https://www.codeleading.com/article/95735054818/
https://blog.csdn.net/zt1091574181/article/details/113611468
6.AsstributeError:‘tuple’ object has no attribute ‘values’
Change loss_dict to loss_dict[0]
7.RuntimeError: The size of tensor a (16) must match the size of tensor b (21) at non-singleton dimension 0
Incremental learning error, the problem should appear when loading the basic training data, just change the optimizer value to None
checkpointer_target = DetectronCheckpointer(
cfg_target, model_target, optimizer=None, scheduler=scheduler,
save_dir=output_dir_target,save_to_disk=save_to_disk, logger=logger_target)
8. There is no training during incremental learning, and direct testing
It should be 40,000 runs for the basic model, and the arguments_target[“iteration”] is directly 40,000. We still set it to 40,000 during incremental training. If we think that we have finished running, we can train directly. You can change it to 80,000 e2e_faster_rcnn_R_50_C4_1x_Target_model.yaml
.MAX_ITER: 80000 # number of iteration
ps: switch between multiple cuda versions
Check the cuda version installed by yourself in the /usr/local/ directory
cd /usr/local
ls
bin cuda cuda-10.2 etc include man share
cud cuda-10.1 cuda-11.0 games lib sbin src
View the current cuda version
nvcc -V
Or use stat cuda
to view the current cuda soft connection
File: cuda -> /usr/local/cuda-10.1
Size: 20 Blocks: 0 IO Block: 4096 symbolic link
Device: 812h/2066d Inode: 2757665 Links: 1
Access: (0777/lrwxrwxrwx) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2022-06-06 21:34:32.342489356 +0800
Modify: 2022-05-22 15:11:26.498549390 +0800
Change: 2022-05-22 15:11:26.498549390 +0800
Birth: -
If you want to change to version 10.2, you need to delete the current link first, and then reset it to 10.2, only two lines of code are required
sudo rm -rf cuda
sudo ln -s /usr/local/cuda-10.2 /usr/local/cuda
Check the cuda version at this time
nvcc -V
It can be seen that the version has been switched
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Wed_Oct_23_19:24:38_PDT_2019
Cuda compilation tools, release 10.2, V10.2.89