"Alchemy" AutoDL Alchemy Diary

As a digression,
let’s talk about why the blogger chose AutoDL. In fact, at the beginning, the blogger had a 2070 and 3080Ti, but the video memory was slightly smaller, one 8G and one 12G. It is very difficult to make a thesis effect, especially when you need it for your own small thesis, you really lack equipment. No way, the teacher has no money, and I can’t afford it myself, so I can only turn my attention to cloud services. Compared with Alibaba Cloud, Tencent Cloud, Baidu Cloud, Huawei Cloud, etc., the prices are ridiculously expensive, 20+ per hour, and one day It's 400+, which is too expensive. After one paper, you can buy two graphics cards. In addition, there are cloud services such as Colab, but I haven't experienced it much. Of course, if you are using Paddle , you must recommend Baidu’s Ai Studio , which can be tried for free, and the architecture of Paddle is roughly the same as that of Pytorch, and it is quick to learn, so it is highly recommended.
The last is AutoDL. Although I have heard of it before, I have never used it. Fortunately, when I was confused, a friend in the same group recommended it to me once, and then I immediately registered to experience it. The price is really cheap. Take 3090 as an example, it is only about 1.5 in an hour, and only about 40 in a day, which is much more pleasing to the eye than other clouds, especially the student party, which is simply our lucky star. And there are special tutorials on it, general problems can be found, and it is convenient to use. It provides a 20G system disk, which can be used to store code-related data. The data disk has 50G, which is enough for general data sets. Yes, the environment is installed and can be used directly.

Click the link to register and experience https://www.autodl.com/ , newcomers still have a 10 yuan experience coupon

insert image description here


1. Decompress the dataset

The dataset used by the blogger is COCO2017. The path in AutoDL is that the /root/autodl-pub/COCO2017system disk is only 20G. If the dataset is decompressed to the system disk, an error may be reported disk quota exceeded(超出磁盘配额). Therefore, we decompress the dataset to the data disk provided by AutoDL/root/autodl-tmp

AutoDL itself collects common data sets, /root/autodl-pub/under the path, and does not count as data disk space

insert image description here

cd /root/autodl-pub
unzip -d /root/autodl-tmp/  ann*.zip
unzip -d /root/autodl-tmp/  val*.zip
unzip -d /root/autodl-tmp/  test*.zip
unzip -d /root/autodl-tmp/  train*.zip
#1、把文件解压到当前目录下
unzip test.zip

#2、如果要把文件解压到指定的目录下,需要用到-d参数。
unzip -d /temp test.zip

#3、解压的时候,有时候不想覆盖已经存在的文件,那么可以加上-n参数
unzip -n test.zip
unzip -n -d /temp test.zip

#4、只看一下zip压缩包中包含哪些文件,不进行解压缩
unzip -l test.zip

#5、查看显示的文件列表还包含压缩比率
unzip -v test.zip

#6、检查zip文件是否损坏
unzip -t test.zip

#7、将压缩文件test.zip在指定目录tmp下解压缩,如果已有相同的文件存在,要求unzip命令覆盖原先的文件
unzip -o test.zip -d /tmp/

2. Compress the folder

Although AutoDL provides the function of uploading and downloading, the download is only valid for a single file, and it cannot be used for a folder. Therefore, it is necessary to compress the folder and download it to the local

  1. Use zip to package the folder, the code is
zip -r -q -o pack.zip mark/

The above command packs all the files under the directory mark/ into a compressed file, and checks the size and type of the packed files. In the first command line,
the -r parameter means recursively package all the contents of the subdirectory, the
-q parameter means quiet mode, that is, no information is output to the screen,
-o means the output file, which needs to be followed by the packaged output file name.

  1. Use the [-e] parameter to create an encrypted compressed package, the code is
zip -r -e o pack.zip mark/

Then enter the password twice


2. Configure the conda environment

AutoDL is equipped with Miniconda, and when using a graphics card to run, Miniconda calls the CUDA version of Pytorch, you do not need to install it separately, after all, there is a charge during the installation.

AutoDL does not create an environment by default, we need to create our own environment separately, but! After the environment is created, it cannot be used directly. If you enter the environment directly, the following error will appear

insert image description here
This error is caused by the fact that we did not initialize conda. First, we need to echo $SHELLcheck the default shell of the system through the command, and then conda init bash && source /root/.bashrcinitialize conda through the command. After the initialization, some AutoDL system information will be given, and here it is Can

# 查看系统所使用的 shell
echo $SHELL

# 初始化 conda
conda init bash && source /root/.bashrc

insert image description here

Then you can install some libraries you need by yourself. AutoDL has already helped to install the corresponding versions of Pytorch, torchvision and other libraries, no need to repeat the installation

insert image description here


3. Monitor graphics card information

On the win system, bloggers generally directly check the "performance in the task manager" to check the GPU information. However, the task manager can only monitor information such as GPU memory, temperature, and occupancy rate, and cannot be detailed. And this method can only be used on win system, not on Linux.

Then it is recommended to use it on the Linux systemnivtoplibrary, you can pip install nvitopinstall it directly through , and enter it in the terminal after the installation is complete nvitip. The following figure is the monitoring interface.

nvitop can be used not only on the Linux system, but also on the win system, using the same method

insert image description here

In addition to using nvitop, you can also use the nvida-smi command to view

nvidia-smi: View the current graphics card Use
nvidia-smi -L: List all graphics card information
nvidia-smi -l 2: Dynamically display graphics card usage information, updated every second, parameter values ​​​​can be modified by yourself
nvidia-smi -lms : Loop dynamic display of
nvidia-smi dmon: device monitor (device monitor)
nvidia-smi -in: display the specified graphics card (if you have multiple graphics cards, the value of n corresponds to the location of the graphics card)

When nvidia-smi -l 2the command is displayed in real time, the monitoring information will be continuously output on the terminal. Personally, I think it is not as clear and simple as nvitop equivalent.


4. Tensorboard monitoring

AutoDL provides tensorboard monitoring, but the log information needs to be placed in /root/tf-logs, otherwise the tensorboard cannot be called
insert image description here

cp -r ./runs/Nov22-16-54\ resnet101_evolution_head_COCO\ 2017/ /root/tf-logs/

cd /root
tensorboard --logdir=tf0logs

5. Install CUDA/cuDNN

This part is already installed by AutoDL. It is just recorded here, so that you can need it in the future.

If the PyTorch and CUDA versions installed by AutoDL are not what we need and need to be installed separately, you can first select the Miniconda/CUDA=1x.x platform image in AutoDL, and then install it yourself if you have to

Pytorch and CUDA have corresponding requirements, such as Pytorch=1.9.0 requires CUDA=11.1


Query the default CUDA/cuDNN version

According to the CUDA information viewed through nvidia-smi, the above driver version is the highest version installed, which does not mean that the instance is installed with the CUDA version

Execute in the terminal to view the CUDA version that comes with the default image (the installation directory is /usr/local/):

查询平台内置镜像中的cuda版本
$ ldconfig -p | grep cuda
        libnvrtc.so.11.0 (libc6,x86-64) => /usr/local/cuda-11.0/targets/x86_64-linux/lib/libnvrtc.so.11.0
        libnvrtc.so (libc6,x86-64) => /usr/local/cuda-11.0/targets/x86_64-linux/lib/libnvrtc.so
        libnvrtc-builtins.so.11.0 (libc6,x86-64) => /usr/local/cuda-11.0/targets/x86_64-linux/lib/libnvrtc-builtins.so.11.0

查询平台内置镜像中的cudnn版本
$ ldconfig -p | grep cudnn
        libcudnn_ops_train.so.8 (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8
        libcudnn_ops_train.so (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so
        libcudnn_ops_infer.so.8 (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8
        libcudnn_ops_infer.so (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so

.soThe number after the above output log is the version number. If you installed cuda through conda, you can check it with the following command:

$ conda list | grep cudatoolkit
cudatoolkit               10.1.243             h6bb024c_0    defaults
$ conda list | grep cudnn
cudnn                     7.6.5                cuda10.1_0    defaults

Install other versions of CUDA/cuDNN

Method 1: Install using conda

Advantages: Simple
Disadvantages: Generally, there will be no header files. If you need to compile, you need to install according to method 2

method:

$ conda install cudatoolkit==xx.xx
$ conda install cudnn==xx.xx

If you don't know what the version number is then search for:

$ conda search cudatoolkit
Loading channels: done
# Name                       Version           Build  Channel             
cudatoolkit                      9.0      h13b8566_0  anaconda/pkgs/main  
cudatoolkit                      9.2               0  anaconda/pkgs/main  
cudatoolkit                 10.0.130               0  anaconda/pkgs/main  
cudatoolkit                 10.1.168               0  anaconda/pkgs/main  
cudatoolkit                 10.1.243      h6bb024c_0  anaconda/pkgs/main  
cudatoolkit                  10.2.89      hfd86e86_0  anaconda/pkgs/main  
cudatoolkit                  10.2.89      hfd86e86_1  anaconda/pkgs/main  
cudatoolkit                 11.0.221      h6bb024c_0  anaconda/pkgs/main  
cudatoolkit                   11.3.1      h2bc3f7f_2  anaconda/pkgs/main

Method 2: Download and install the installation package

CUDA download address: https://developer.nvidia.com/cuda-toolkit-archive

installation method:

下载.run格式的安装包后:
$ chmod +x xxx.run   # 增加执行权限
$ ./xxx.run          # 运行安装包

cuDNN download address: https://developer.nvidia.com/cudnn


installation method:

Unzip first, then put the dynamic link library and header files into the corresponding directory

$ mv cuda/include/* /usr/local/cuda/include/
 $ chmod +x cuda/lib64/* && mv cuda/lib64/* /usr/local/cuda/lib64/

After the installation is complete, increase the environment variable:

$ echo "export LD_LIBRARY_PATH=/usr/local/cuda/lib64/:${LD_LIBRARY_PATH} \n" >> ~/.bashrc
$ source ~/.bashrc && ldconfig

hint:
The default image has the most native CUDA and cuDNN built in. If you have installed cudatoolkits, etc., then the cudatoolkits installed in conda will generally be used first by default.



end

Finally, if you think AutoDL is good, you can click the link to register and experience https://www.autodl.com/ . By inviting and registering, you can also get a 10 yuan voucher, which is equivalent to a free trial of 10 yuan. If you think it's good, you can recharge and use it. After all, there are fewer and fewer places to collect wool.

Guess you like

Origin blog.csdn.net/ViatorSun/article/details/127385170