[Pro-test] MMDetection3.0 environment configuration in cluster environment

This article records the content of using MMDetection in a cluster environment.

Environment introduction: The cluster device used is a local cluster with management nodes and computing nodes, and the management and computing are on different hosts. As a user, there is no super administrator authority.

MMdetection source code download click to enter

insert image description here

Here we mainly record the environment configuration. In fact, there is a specific configuration process in the official website tutorial, but here we mainly record the environment configuration problems under the current equipment conditions, which is convenient for reference later.

The official configuration document can be clicked on this link

insert image description here
The different places are mainly recorded here. Since only the management node can be connected to the Internet in this cluster, the process of creating the environment or downloading the installation package is mainly performed on the management node. However, during the source code compilation process, it needs to be switched to the computing node.

Environment configuration details

On the official website:

conda install pytorch torchvision -c pytorch

Actual configuration:
This installation is still switched to the corresponding environment on the management node, and then run the following command. Since the management node is generally a multi-core CPU, but the GPU is generally required in the calculation process, so the following commands can be used if the GPU is required, mainly referring to the commands on the Pytorch official website.

conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=11.3 -c pytorch

After the command is run, switch to the torch command to check whether the environment is normal:

torch.cuda.is_available()

The next problem is the compilation process
official website command:

pip install -v -e .

This command requires some dependent packages, that is, it will actively call requirements.txt under the source code. If there is no corresponding package in the environment, you need to install the package. At this time, you will encounter the problem that the computing node cannot be connected to the Internet, so Before this command, it is best to switch to the management node, switch to the corresponding environment, and then run the following command.

pip install -r requirements.txt

After the installation is successful, switch to the computing node, switch to the corresponding environment, and compile on the node with GPU. This process is very fast. Generally speaking, if no error is reported, the mmdet package will be generated.

insert image description here
This package is said on the official website, and it can also be installed through mim install mmdet, but to be honest, I failed to install it successfully through this command. It should be a problem with the selected cuda and torch versions. Maybe other versions have mmdet under it. Click the link on the official website As follows, mmdet was not found at all.

insert image description here

sample test

The official website is as follows:
insert image description here
The first command is to download the pre-trained model:

mim download mmdet --config rtmdet_tiny_8xb32-300e_coco --dest .

The second command is a test command, but you need to pay attention to modify the path of the configuration file

python demo/image_demo.py demo/demo.jpg ./configs/rtmdet/rtmdet_tiny_8xb32-300e_coco.py --weights rtmdet_tiny_8xb32-300e_coco_20220902_112414-78e30dcc.pth --device cpu

But you will find that when you run the first command, the program will report an error:

ModuleNotFoundError: No module named ‘mmdet‘

Obviously, we have used the command pip install -v -e . to install mmdet during the source code compilation process, why is there still an error? This requires us to run one more command:

python setup.py install

The corresponding mmdet dependencies are about to be added to our environment variables so that they can run.

insert image description here

After the above, run the test command again, and you can find the output as follows:

insert image description here
After passing the test, the environment configuration is successful. Of course, if you are not under similar experimental hardware conditions, the environment configuration will be simpler, just a fool's operation.

Guess you like

Origin blog.csdn.net/qq_29750461/article/details/131290084