Faster-ILOD, maskrcnn_benchmark installation process and encountered problems

paper

Faster ILOD: Incremental learning for object detectors based on faster RCNN 2020

Paper: https://arxiv.org/abs/2003.03901
Code: https://github.com/CanPeng123/Faster-ILOD

the code

一、Requirements:

  • PyTorch 1.0 from a nightly release. It will not work with 1.0 nor
    1.0.1. Installation instructions can be found in https://pytorch.org/get-started/locally/

  • torchvision from master

  • cocoapi

  • yacs

  • matplotlib

  • GCC >= 4.9

  • OpenCV

  • CUDA >= 9.0

2. Install Step-by-step installation

# first, make sure that your conda is setup properly with the right environment
# for that, check that `which conda`, `which pip` and `which python` points to the
# right path. From a clean conda env, this is what you need to do

conda create --name maskrcnn_benchmark -y
conda activate maskrcnn_benchmark

# this installs the right pip and dependencies for the fresh python
conda install ipython pip

# maskrcnn_benchmark and coco api dependencies
pip install ninja yacs cython matplotlib tqdm opencv-python

# follow PyTorch installation in https://pytorch.org/get-started/locally/
# we give the instructions for CUDA 9.0
conda install -c pytorch pytorch-nightly torchvision cudatoolkit=9.0

export INSTALL_DIR=$PWD

# install pycocotools
cd $INSTALL_DIR
git clone https://github.com/cocodataset/cocoapi.git
cd cocoapi/PythonAPI
python setup.py build_ext install

# install cityscapesScripts
cd $INSTALL_DIR
git clone https://github.com/mcordts/cityscapesScripts.git
cd cityscapesScripts/
python setup.py build_ext install

# install apex
cd $INSTALL_DIR
git clone https://github.com/NVIDIA/apex.git
cd apex
python setup.py install --cuda_ext --cpp_ext

# install PyTorch Detection
cd $INSTALL_DIR
git clone https://github.com/facebookresearch/maskrcnn-benchmark.git
cd maskrcnn-benchmark

# the following will install the lib with
# symbolic links, so that you can modify
# the files if you want and won't need to
# re-build it
python setup.py build develop


unset INSTALL_DIR

# or if you are on macOS
# MACOSX_DEPLOYMENT_TARGET=10.9 CC=clang CXX=clang++ python setup.py build develop

3. Faster-ILOD

After installing the maskrcnn environment, overwrite the Faster-ILOD related codes into the maskrcnn related folders, run python setup.py build develop to recompile or download the Faster-ILOD codes directly.

4. Run Faster-ILOD

Take 15+5 as an example:

1. Modify the dataset path

Modify Faster-ILOD/maskrcnn_benchmark/config/paths_catalog.py to find the path corresponding to voc and change it to your own.
insert image description here

2. Modify the configuration file

/configs/e2e_faster_rcnn_R_50_C4_1x.yaml
can modify various parameters according to requirements, and this file will not be modified for now.
insert image description here

3. Training the basic network

After running python tools/train_first_step.py --config-file="./configs/e2e_faster_rcnn_R_50_C4_1x.yaml"
successfully, you can view the training output /home/incremental_learning_ResNet50_C4/RPN_15_classes_40k_stepsin .

4. Incremental training

(1) Modify e2e_faster_rcnn_R_50_C4_1x_Source_model.yaml and e2e_faster_rcnn_R_50_C4_1x_Target_model.yaml, and modify the categories, new categories, old categories in the file, the path of the final model trained in the previous stage, and the output path accordingly. Run python tools/train_incremental.pyto get the final training result in the corresponding output file.

  • e2e_faster_rcnn_R_50_C4_1x_Source_model.yaml
    insert image description here
  • e2e_faster_rcnn_R_50_C4_1x_Target_model.yaml
    insert image description here
  • tools/train_incremental.py
source_model_config_file = "/home/chenfang/maskrcnn-benchmark/configs/e2e_faster_rcnn_R_50_C4_1x_Source_model.yaml"
target_model_config_file = "/home/chenfang/maskrcnn-benchmark/configs/e2e_faster_rcnn_R_50_C4_1x_Target_model.yaml"

Five, encounter problems

1. git clone cannot be downloaded because of network problems

The file can be downloaded to the local and uploaded to the server;
the installation package can also be downloaded to the local and uploaded to the server, pip install file path to install

2.RuntimeError: Error compiling objects for extension

The pytorch version is inappropriate.
After
looking at the solution, I downgraded the pytorch version to 1.5 successfully.

CUDA 10.1
Pytorch 1.4.0
torchvision 0.5.0

For more solutions, please refer to https://github.com/facebookresearch/maskrcnn-benchmark/issues/1236

3.RuntimeError: Output 0 of UnbindBackward is a view and its base or another view of its base has been modified inplace.

RuntimeError: Output 0 of UnbindBackward is a view and its base or another view of its base has been modified inplace. This view is the output of a function that returns multiple views. Such functions do not allow the output views to be modified inplace. You should replace the inplace operation by an out-of-place one.

insert image description here

Reference: https://blog.csdn.net/Ginomica_xyx/article/details/120491859

It is known that the cause of the problem is that self.bbox has been modified multiple times. When it is modified for the second time, python does not know whether to operate the original self.bbox or the modified self.bbox.
Know the problem and try to solve the problem: modify the code to copy self.bbox to a parameter and then operate on this parameter (not allowed); deep copy is not allowed either.
Check related issues, in the final analysis, it is a bug of pytorch1.7.0.
Downgrade the pytorch version to 1.6.0 to solve this problem

4.unable to execute ‘usr/local/cuda-10.0/bin/nvcc‘: No such file or directory

Reference:
How to view and modify the PATH environment variable in linux

https://blog.csdn.net/qq_41251963/article/details/110120386
https://blog.csdn.net/tailonh/article/details/120322932
https://blog.csdn.net/G_inkk/article/details/124584873

5. error: cannot call member function ‘void std::basic_string<_CharT, _Traits, _Alloc>::

python setup.py build develop recompilation is an error
RuntimeError: Error compiling objects for extension, the cause of the error is
/usr/include/c++/7/bits/basic_string.tcc:1067:16: error: cannot call member function ' void std::basic_string<_CharT, _Traits, _Alloc>::_Rep
solution

Reference: https://blog.csdn.net/weixin_45328592/article/details/114646355
https://blog.csdn.net/qq_29695701/article/details/118548238

sudo gedit /usr/include/c++/7/bits/basic_string.tcc

Will

__p->_M_set_sharable()

changed to

(*__p)._M_set_sharable()

That's it.
If there is a problem with modifying the file:
'readonly' option is set (add ! to override)
the current user does not have permission, first sudo -i to switch to root permission and then modify it Directly use sudo vim to open the file for modification

Reference
https://blog.csdn.net/cheng_feng_xiao_zhan/article/details/53391474

RuntimeError: Error compiling objects for extension
may also be caused by the following:

  • pytorch version mismatch

  • The problem of cuda multi-version switching

Solution: There is an extra colon in the error path, indicating that there is a problem with the setting of the environment variable

sudo vim ~/.bashrc
export CUDA_HOME=$CUDA_HOME:/usr/local/cuda
改的
export CUDA_HOME=/usr/local/cuda

source ~/.bashrc

Reference:
https://blog.csdn.net/loovelj/article/details/110490986
https://www.codeleading.com/article/95735054818/
https://blog.csdn.net/zt1091574181/article/details/113611468

6.AsstributeError:‘tuple’ object has no attribute ‘values’

Change loss_dict to loss_dict[0]

7.RuntimeError: The size of tensor a (16) must match the size of tensor b (21) at non-singleton dimension 0

Incremental learning error, the problem should appear when loading the basic training data, just change the optimizer value to None

checkpointer_target = DetectronCheckpointer(
cfg_target, model_target, optimizer=None, scheduler=scheduler,
save_dir=output_dir_target,save_to_disk=save_to_disk, logger=logger_target)

8. There is no training during incremental learning, and direct testing

It should be 40,000 runs for the basic model, and the arguments_target[“iteration”] is directly 40,000. We still set it to 40,000 during incremental training. If we think that we have finished running, we can train directly. You can change it to 80,000 e2e_faster_rcnn_R_50_C4_1x_Target_model.yaml.MAX_ITER: 80000 # number of iteration

ps: switch between multiple cuda versions

Check the cuda version installed by yourself in the /usr/local/ directory

cd /usr/local 
ls
bin  cuda       cuda-10.2  etc    include  man   share
cud  cuda-10.1  cuda-11.0  games  lib      sbin  src

View the current cuda version

nvcc  -V

Or use stat cudato view the current cuda soft connection

  File: cuda -> /usr/local/cuda-10.1
  Size: 20              Blocks: 0          IO Block: 4096   symbolic link
Device: 812h/2066d      Inode: 2757665     Links: 1
Access: (0777/lrwxrwxrwx)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2022-06-06 21:34:32.342489356 +0800
Modify: 2022-05-22 15:11:26.498549390 +0800
Change: 2022-05-22 15:11:26.498549390 +0800
 Birth: -

If you want to change to version 10.2, you need to delete the current link first, and then reset it to 10.2, only two lines of code are required

sudo rm -rf cuda
sudo ln -s /usr/local/cuda-10.2  /usr/local/cuda

Check the cuda version at this time

nvcc -V

It can be seen that the version has been switched

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Wed_Oct_23_19:24:38_PDT_2019
Cuda compilation tools, release 10.2, V10.2.89

Guess you like

Origin blog.csdn.net/chenfang0529/article/details/124333649