As a digression,
let’s talk about why the blogger chose AutoDL. In fact, at the beginning, the blogger had a 2070 and 3080Ti, but the video memory was slightly smaller, one 8G and one 12G. It is very difficult to make a thesis effect, especially when you need it for your own small thesis, you really lack equipment. No way, the teacher has no money, and I can’t afford it myself, so I can only turn my attention to cloud services. Compared with Alibaba Cloud, Tencent Cloud, Baidu Cloud, Huawei Cloud, etc., the prices are ridiculously expensive, 20+ per hour, and one day It's 400+, which is too expensive. After one paper, you can buy two graphics cards. In addition, there are cloud services such as Colab, but I haven't experienced it much. Of course, if you are using Paddle , you must recommend Baidu’s Ai Studio , which can be tried for free, and the architecture of Paddle is roughly the same as that of Pytorch, and it is quick to learn, so it is highly recommended.
The last is AutoDL. Although I have heard of it before, I have never used it. Fortunately, when I was confused, a friend in the same group recommended it to me once, and then I immediately registered to experience it. The price is really cheap. Take 3090 as an example, it is only about 1.5 in an hour, and only about 40 in a day, which is much more pleasing to the eye than other clouds, especially the student party, which is simply our lucky star. And there are special tutorials on it, general problems can be found, and it is convenient to use. It provides a 20G system disk, which can be used to store code-related data. The data disk has 50G, which is enough for general data sets. Yes, the environment is installed and can be used directly.
Click the link to register and experience https://www.autodl.com/ , newcomers still have a 10 yuan experience coupon
1. Decompress the dataset
The dataset used by the blogger is COCO2017. The path in AutoDL is that the /root/autodl-pub/COCO2017
system disk is only 20G. If the dataset is decompressed to the system disk, an error may be reported disk quota exceeded(超出磁盘配额)
. Therefore, we decompress the dataset to the data disk provided by AutoDL/root/autodl-tmp
AutoDL itself collects common data sets,
/root/autodl-pub/
under the path, and does not count as data disk space
cd /root/autodl-pub
unzip -d /root/autodl-tmp/ ann*.zip
unzip -d /root/autodl-tmp/ val*.zip
unzip -d /root/autodl-tmp/ test*.zip
unzip -d /root/autodl-tmp/ train*.zip
#1、把文件解压到当前目录下
unzip test.zip
#2、如果要把文件解压到指定的目录下,需要用到-d参数。
unzip -d /temp test.zip
#3、解压的时候,有时候不想覆盖已经存在的文件,那么可以加上-n参数
unzip -n test.zip
unzip -n -d /temp test.zip
#4、只看一下zip压缩包中包含哪些文件,不进行解压缩
unzip -l test.zip
#5、查看显示的文件列表还包含压缩比率
unzip -v test.zip
#6、检查zip文件是否损坏
unzip -t test.zip
#7、将压缩文件test.zip在指定目录tmp下解压缩,如果已有相同的文件存在,要求unzip命令覆盖原先的文件
unzip -o test.zip -d /tmp/
2. Compress the folder
Although AutoDL provides the function of uploading and downloading, the download is only valid for a single file, and it cannot be used for a folder. Therefore, it is necessary to compress the folder and download it to the local
- Use zip to package the folder, the code is
zip -r -q -o pack.zip mark/
The above command packs all the files under the directory mark/ into a compressed file, and checks the size and type of the packed files. In the first command line,
the -r parameter means recursively package all the contents of the subdirectory, the
-q parameter means quiet mode, that is, no information is output to the screen,
-o means the output file, which needs to be followed by the packaged output file name.
- Use the [-e] parameter to create an encrypted compressed package, the code is
zip -r -e o pack.zip mark/
Then enter the password twice
2. Configure the conda environment
AutoDL is equipped with Miniconda, and when using a graphics card to run, Miniconda calls the CUDA version of Pytorch, you do not need to install it separately, after all, there is a charge during the installation.
AutoDL does not create an environment by default, we need to create our own environment separately, but! After the environment is created, it cannot be used directly. If you enter the environment directly, the following error will appear
This error is caused by the fact that we did not initialize conda. First, we need to echo $SHELL
check the default shell of the system through the command, and then conda init bash && source /root/.bashrc
initialize conda through the command. After the initialization, some AutoDL system information will be given, and here it is Can
# 查看系统所使用的 shell
echo $SHELL
# 初始化 conda
conda init bash && source /root/.bashrc
Then you can install some libraries you need by yourself. AutoDL has already helped to install the corresponding versions of Pytorch, torchvision and other libraries, no need to repeat the installation
3. Monitor graphics card information
On the win system, bloggers generally directly check the "performance in the task manager" to check the GPU information. However, the task manager can only monitor information such as GPU memory, temperature, and occupancy rate, and cannot be detailed. And this method can only be used on win system, not on Linux.
Then it is recommended to use it on the Linux systemnivtoplibrary, you can pip install nvitop
install it directly through , and enter it in the terminal after the installation is complete nvitip
. The following figure is the monitoring interface.
nvitop can be used not only on the Linux system, but also on the win system, using the same method
In addition to using nvitop, you can also use the nvida-smi command to view
nvidia-smi: View the current graphics card Use
nvidia-smi -L: List all graphics card information
nvidia-smi -l 2: Dynamically display graphics card usage information, updated every second, parameter values can be modified by yourself
nvidia-smi -lms : Loop dynamic display of
nvidia-smi dmon: device monitor (device monitor)
nvidia-smi -in: display the specified graphics card (if you have multiple graphics cards, the value of n corresponds to the location of the graphics card)
When nvidia-smi -l 2
the command is displayed in real time, the monitoring information will be continuously output on the terminal. Personally, I think it is not as clear and simple as nvitop equivalent.
4. Tensorboard monitoring
AutoDL provides tensorboard monitoring, but the log information needs to be placed in /root/tf-logs
, otherwise the tensorboard cannot be called
cp -r ./runs/Nov22-16-54\ resnet101_evolution_head_COCO\ 2017/ /root/tf-logs/
cd /root
tensorboard --logdir=tf0logs
5. Install CUDA/cuDNN
This part is already installed by AutoDL. It is just recorded here, so that you can need it in the future.
If the PyTorch and CUDA versions installed by AutoDL are not what we need and need to be installed separately, you can first select the Miniconda/CUDA=1x.x platform image in AutoDL, and then install it yourself if you have to
Pytorch and CUDA have corresponding requirements, such as Pytorch=1.9.0 requires CUDA=11.1
Query the default CUDA/cuDNN version
According to the CUDA information viewed through nvidia-smi, the above driver version is the highest version installed, which does not mean that the instance is installed with the CUDA version
Execute in the terminal to view the CUDA version that comes with the default image (the installation directory is /usr/local/):
查询平台内置镜像中的cuda版本
$ ldconfig -p | grep cuda
libnvrtc.so.11.0 (libc6,x86-64) => /usr/local/cuda-11.0/targets/x86_64-linux/lib/libnvrtc.so.11.0
libnvrtc.so (libc6,x86-64) => /usr/local/cuda-11.0/targets/x86_64-linux/lib/libnvrtc.so
libnvrtc-builtins.so.11.0 (libc6,x86-64) => /usr/local/cuda-11.0/targets/x86_64-linux/lib/libnvrtc-builtins.so.11.0
查询平台内置镜像中的cudnn版本
$ ldconfig -p | grep cudnn
libcudnn_ops_train.so.8 (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8
libcudnn_ops_train.so (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so
libcudnn_ops_infer.so.8 (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8
libcudnn_ops_infer.so (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so
.so
The number after the above output log is the version number. If you installed cuda through conda, you can check it with the following command:
$ conda list | grep cudatoolkit
cudatoolkit 10.1.243 h6bb024c_0 defaults
$ conda list | grep cudnn
cudnn 7.6.5 cuda10.1_0 defaults
Install other versions of CUDA/cuDNN
Method 1: Install using conda
Advantages: Simple
Disadvantages: Generally, there will be no header files. If you need to compile, you need to install according to method 2
method:
$ conda install cudatoolkit==xx.xx
$ conda install cudnn==xx.xx
If you don't know what the version number is then search for:
$ conda search cudatoolkit
Loading channels: done
# Name Version Build Channel
cudatoolkit 9.0 h13b8566_0 anaconda/pkgs/main
cudatoolkit 9.2 0 anaconda/pkgs/main
cudatoolkit 10.0.130 0 anaconda/pkgs/main
cudatoolkit 10.1.168 0 anaconda/pkgs/main
cudatoolkit 10.1.243 h6bb024c_0 anaconda/pkgs/main
cudatoolkit 10.2.89 hfd86e86_0 anaconda/pkgs/main
cudatoolkit 10.2.89 hfd86e86_1 anaconda/pkgs/main
cudatoolkit 11.0.221 h6bb024c_0 anaconda/pkgs/main
cudatoolkit 11.3.1 h2bc3f7f_2 anaconda/pkgs/main
Method 2: Download and install the installation package
CUDA download address: https://developer.nvidia.com/cuda-toolkit-archive
installation method:
下载.run格式的安装包后:
$ chmod +x xxx.run # 增加执行权限
$ ./xxx.run # 运行安装包
cuDNN download address: https://developer.nvidia.com/cudnn
installation method:
Unzip first, then put the dynamic link library and header files into the corresponding directory
$ mv cuda/include/* /usr/local/cuda/include/
$ chmod +x cuda/lib64/* && mv cuda/lib64/* /usr/local/cuda/lib64/
After the installation is complete, increase the environment variable:
$ echo "export LD_LIBRARY_PATH=/usr/local/cuda/lib64/:${LD_LIBRARY_PATH} \n" >> ~/.bashrc
$ source ~/.bashrc && ldconfig
hint:
The default image has the most native CUDA and cuDNN built in. If you have installed cudatoolkits, etc., then the cudatoolkits installed in conda will generally be used first by default.
end
Finally, if you think AutoDL is good, you can click the link to register and experience https://www.autodl.com/ . By inviting and registering, you can also get a 10 yuan voucher, which is equivalent to a free trial of 10 yuan. If you think it's good, you can recharge and use it. After all, there are fewer and fewer places to collect wool.