This time, I will re-record the installation process of paddle, mainly because the correct environment installation was not performed when the server environment was initialized.
basic environment
Cloud disk deployment
conda install
Anaconda installation
The first is to download the relevant package command:
sudo wget https://repo.anaconda.com/archive/Anaconda3-2020.02-Linux-x86_64.sh
Execute after the download is complete
bash Anaconda3-2020.02-Linux-x86_64.sh
Install Anaconda
Enter the installation program and prompt to enter "ENTER" to continue (Please, press ENTER to continue): as shown below
Then enter all the time, which is actually an underline operation. What appears here are the license regulations, until the final prompt whether to accept, just enter yes
Then it prompts that Anaconda3 will now be installed into this location:
/root/anaconda3, that is, it is installed in this location by default. We can choose the default and press Enter directly
Then it started to install, and then prompted whether it needs to be initialized, just enter yes
Finally, the installation was successful, and finally executed
source ~/.bashrc
We can then create the environment as before, activate the environment, and install the appropriate packages.
paddle installation
Create a conda environment
conda create -n paddle python=3.7
conda activate paddle
pytorch install
pip install torch==1.10.1+cu111 torchvision==0.11.2+cu111 torchaudio==0.10.1 -f https://download.pytorch.org/whl/cu111/torch_stable.html
paddle installation
python -m pip install paddlepaddle-gpu==2.4.2 -f https://www.paddlepaddle.org.cn/whl/windows/mkl/avx/stable.html
Other dependent package installation
pip install -r requirements.txt --index-url https://pypi.douban.com/simple
Error:
ImportError: libcudart.so.10.2: cannot open shared object file: No
such file or directory
Solution:
Download the file from somewhere else and upload it
Then run the following command
sudo ldconfig /usr/local/cuda-11.2/lib64
The configuration of the Paddle environment itself is not difficult, but a series of errors occurred due to problems with the previous pre-installed system.
Training process:
NOTE: Be sure to match this file.
ArtFormer environment configuration
Environment: CUDA11.2
Install pytorch
pip install torch==1.10.0+cu111 torchvision==0.11.0+cu111 torchaudio==0.10.0 -f https://download.pytorch.org/whl/torch_stable.html
Install MMCV
pip install mmcv==2.0.0rc4 -f https://download.openmmlab.com/mmcv/dist/cu111/torch1.10/index.html
Install MMCV-full
pip install mmcv-full -f https://download.openmmlab.com/mmcv/dist/cu111/torch1.10.0/index.html
other dependencies
pip install timm
pip install opencv-python
pip install einops
report error
Error 1:
ModuleNotFoundError: No module named ‘mmseg‘
Solution 1:
pip install -i https://pypi.tuna.tsinghua.edu.cn/simple mmsegmentation
But this still throws an error:
from mmseg.apis import init_random_seed, set_random_seed,
train_segmentor ImportError: cannot import name ‘init_random_seed’
from ‘mmseg.apis’
In fact, the version is wrong, to solve:
pip install mmsegmentation==0.30.0
Error 2:
File “/home/ubuntu/anaconda3/envs/afformer/lib/python3.7/site-packages/mmseg/init.py”,line 62, in
f’MMCV=={mmcv.version} is used but incompatible. ’ \ AssertionError: MMCV==1.7.1 is used but incompatible. Please install mmcv>=2.0.0rc4.
Solution 2: Modify the /home/ubuntu/anaconda3/envs/afformer/lib/python3.7/site-packages/mmseg/init.py file :
Error 3:
AttributeError: module 'torch.nn' has no attribute 'SiLU'
Solution 3: The method is missing, usually due to the torch version, change the torch version decisively
run
Switch to tools and execute python train.py
to report an error:
raise type(e)(f'{
obj_cls.__name__}: {
e}')
KeyError: "EncoderDecoder: 'CLS is not in the models registry'"
This is because there is already a mmseg in the source code, and one of our installation packages has the same name, just delete the installation package.
Then run:
python setup.py develop
The final installed environment is: