YOLOX algorithm debugging record

YOLOX is improved on the basis of YOLOv3 and has performance comparable to YOLOv5. Its model structure is as follows:

insert image description here
Since bloggers only want to use YOLOX for comparative experiments, they don't need to know too much about the structure of the model.
Previous bloggers have debugged YOLOv5, YOLOv7, and YOLOv8. In comparison, the environment configuration of YOLOX is similar, but its parameter settings are too scattered, and it is troublesome to change. For example, the parameters of epoch should be placed in the yolox_base.py file To inherit instead of specifying directly in train.py. Without further ado, let's start the debugging process.

Environment configuration

The debugging process of YOLOX is basically similar to that of YOLOv5, the difference is that an installation process is required.
i.e. execute:

python setup.py develop

Otherwise, it will prompt that the yolox file cannot be found when running

insert image description here

After successful operation, the results are as follows. It is worth noting that it is very difficult for bloggers to succeed locally, but it is very easy on the server.

insert image description here

Then there is the conda environment configuration process, which is basically consistent with YOLOv5, and you can directly use the command configuration:

conda create -n yolox python=3.8
source activate yolox
pip install -r requirements.txt

Dataset configuration

The data set used by YOLOX is COCO, but the difference is that the parameters are not specified in its training and testing, but are directly written in the data set reading file. We only need to modify the directory according to its requirements, and the data set Just put it in the datasets/COCO folder. Of course, you can also create a soft connection like a blogger:

ln -s /data/datasets/coco/ /home/ubuntu/outputs/yolox/YOLOX-main/datasets/COCO/

But this method keeps reporting an error:

File "/home/ubuntu/outputs/yolox/YOLOX-main/yolox/data/datasets/datasets_wrapper.py", line 177, in __del__
if self.cache and self.cache_type == "ram":
AttributeError: 'COCODataset' object has no attribute 'cache'

There is no way but to copy the data set to this directory.
Then run the error:

assert img is not None, f"file named {
      
      img_file} not found"
AssertionError: file named /home/ubuntu/outputs/yolox/YOLOX-main/datasets/COCO/val2017/000000567197.jpg not found

After a closer look, it turns out that there is a problem with the directory structure. There is no images level directory, just delete this directory. The final directory structure is:

insert image description here

training model

 <class 'torch.autograd.variable.Variable'>
RuntimeError: FIND was unable to find an engine to execute this computation

This is because the blogger installed torch 2.0 by default when installing the environment, resulting in an error. Just change the torch version:

conda install pytorch==1.12.0 torchvision==0.13.0 torchaudio==0.12.0 cudatoolkit=11.6 -c pytorch -c conda-forge

Then you need to modify several parameters, the first is to specify the model name, the blogger uses yolox-l

parser.add_argument("-n", "--name", type=str, default="yolox-l", help="model name")

Then set the configuration file of yolox-l, –f means to read from the file, and then modify the parameters in the corresponding file:

parser.add_argument(
        "-f",
        "--exp_file",
        default="/home/ubuntu/outputs/yolox/YOLOX-main/exps/default/yolox_l.py",
        type=str,
        help="plz input your experiment description file",
    )

Modification /home/ubuntu/outputs/yolox/YOLOX-main/exps/default/yolox_l.py, the num_class setting is wrong, the blogger is used to the DETR class model, and the background class is added, there should be only 3 classes in fact.

insert image description here

class Exp(MyExp):
    def __init__(self):
        super(Exp, self).__init__()
        self.depth = 1.0
        self.width = 1.0
        self.exp_name = os.path.split(os.path.realpath(__file__))[1].split(".")[0]
        
        # Define yourself dataset path
        self.train_ann = "instances_train2017.json"
        self.val_ann = "instances_val2017.json"

        self.num_classes = 4

        self.max_epoch = 1
        self.data_num_workers = 8
        self.print_interval=1
        self.eval_interval = 1

Then there is the batch-szie parameter. The video memory occupied by YOLOX is still relatively large, and the batch-size is set to 6.

insert image description here
The training time is quite fast, about 45 minutes per epoch. The result of training for 1 epoch, because the pre-trained model is not used, the value is very low. Another problem is that the num_class setting is wrong. The blogger is used to the DETR class model, and the background class is added. In fact, there should be only 3 classes.

insert image description here

Pre-trained model fine-tuning

We can use the model trained by YOLOX-L as a pre-training model, fine-tune it on the model, so that it can converge quickly, and the trained num_class=80, we can keep it as it is, that is, num_class=3, and the model will automatically handle it Issues with inconsistent categories. After using the pre-trained model, the iteration speed is significantly accelerated, and the accuracy is also rapidly improved.

parser.add_argument("-c", "--ckpt", default="/home/ubuntu/outputs/yolox/YOLOX-main/yolox_l.pth.tar", type=str, help="checkpoint file")

The result of training for one epoch after fine-tuning using the pre-trained model.
insert image description here

evaluation model

Complete the parameter configuration of eval.py:

python -m yolox.tools.eval -n  yolox-s -c yolox_s.pth -b 64 -d 8 --conf 0.001 [--fp16] [--fuse]

Of course, you can also use parameters, mainly modify these two parameters

insert image description here

Then run python eval.pythe command. Here, it is found that using the downloaded weight file will report an error, so the blogger trained himself for 1 epoch and saved the weight result. There is no problem using this, and the file is saved in YOLOX_outputs. But it seems that a problem has been found, that is, the value is so low.

insert image description here

model reasoning

First of all, we download the model that has been trained. The blogger here chooses YOLOX-L. It is worth noting that downloading this file requires over the wall. The downloaded weight file is a tar file, so it needs to be decompressed:

tar -xvf yolox_l.pth.tar

But unexpectedly, an error was reported:

tar: This does not look like a tar archive
tar: Skipping to next header
tar: Exiting with failure status due to previous errors

This is a BUG
solution:

gzip -d xxxx.tar.gz (对于.tar.gz文件的处理方式)
tar -xf xxxx.tar    (对于.tar文件处理方式)

Still not working, there is no way, the blogger can only change the suffix name to zip, and then use unzip to decompress the file. But after decompression, it is a folder, which is different from the pth file that the blogger saw before. Sure enough, an error is reported when running:

super().init(open(name, mode)) IsADirectoryError: [Errno 21] Is a
directory: ‘/home/ubuntu/outputs/yolox/YOLOX-main/yolox_l.pth’

It turns out that the weight file of YOLOX does not need to be decompressed, it can be used directly, that is, when specifying the file:

parser.add_argument("-c", "--ckpt", default="/home/ubuntu/outputs/yolox/YOLOX-main/yolox_l.pth.tar", type=str, help="ckpt for eval")

, specifying size=224, its parameters and calculations are given in Demo.py,
insert image description here

The reasoning results are as follows:

insert image description here

Guess you like

Origin blog.csdn.net/pengxiang1998/article/details/132368252