Project development tutorials and common problems and solutions
Table of contents
Table of contents
Project development tutorials and common problems and solutions
1. Building a Python development environment
1. Install cuda cudnn (required for getting started with deep learning)
(1) Windows installation method
(2) Ubuntu18.04 installation method
2. Install Python (Anaconda is recommended)
(1) How to install python on Windows
(2) How to install python on Ubuntu 18.04
(3) Python development tool (IDE): Pycharm is strongly recommended
3. Pytorch installation (required for getting started with deep learning)
4. Install project dependencies
6. Visualization tool: how to use TensorBoard
2. Common problems and solutions
(2) The directory path is filled in incorrectly
(3) Chinese path problem (special attention)
(5) OSError: The paging file is too small to complete the operation
This is a basic tutorial for the development of related projects related to AI eating big melon. For any source code downloaded from the public account [AI eating big melon], you can refer to this blog post to build a development environment
Please refer to the project installation tutorial ( for beginners, please read the following tutorial first and configure the development environment ):
- Project development tutorials and common problems and solutions
- Video Tutorial: 1 Teach you how to install CUDA and cuDNN (1)
- Video Tutorial: 2 Teach you how to install CUDA and cuDNN (2)
- Video tutorial: 3 How to create a pycharm environment with Anaconda
- Video tutorial: 4 How to use the python environment created by Anaconda in pycharm
[Respect originality, please indicate the source for reprinting] https://blog.csdn.net/guyuealian/article/details/129163343
1. Building a Python development environment
If the project involves deep learning, such as Pytorch, TensorFlow and other deep learning frameworks, then the GPU development environment must be configured
1. Install cuda cudnn (required for getting started with deep learning)
The algorithm of the deep learning model is relatively complicated. If the CPU is used for calculation, the speed will be very slow, so the GPU is needed for parallel computing acceleration. Deep learning frameworks, such as Pytorch and TensorFlow, all support GPU training. The use of GPU devices requires the support of graphics cards, such as common 1080 graphics cards, 2070 graphics cards, etc., and the corresponding graphics card drivers, as well as CUDA and cuDNN libraries, need to be installed. CUDA is a parallel computing platform and programming model invented by NVIDIA that dramatically increases computing performance by harnessing the processing power of graphics processing units (GPUs). The cuDNN (CUDA Deep Neural Network library) is an acceleration library for deep neural networks built by NVIDIA , and it is a GPU acceleration library for deep neural networks.
(1) Windows installation method
- Video Tutorial: 1 Teach you how to install CUDA and cuDNN (1)
- Video Tutorial: 2 Teach you how to install CUDA and cuDNN (2)
- Refer to the installation tutorial: CUDA and cuDNN installation tutorial under Windows10 system
(2) Ubuntu18.04 installation method
- Refer to the installation tutorial: Install cuda and cudnn on ubuntu18.04
- After installing the NVIDIA graphics card, you can view the highest supported version of cuda through nvidia-smi . When downloading the CUDA installation package, only the version number is less than or equal to the version number.
# 安装NVIDIA显卡:
# 安装cuda:注意Driver去掉[x],不要安装Driver
sudo sh cuda_12.0.0_525.60.13_linux.run
# 在~/.bashrc添加环境变量
export PATH=/usr/local/cuda-12.0/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-12.0/lib64:$LD_LIBRARY_PATH
export CUDA_HOME=/usr/local/cuda-12.0
# 激活环境变量后,可输入:nvdia-smi验证显卡使用情况
source ~/.bashrc
# 安装cudnn:下载对应cuda版本的cudnn,然后解压,拷贝cudnn文件
sudo cp include/cudnn*.h /usr/local/cuda-12.0/include
sudo cp lib/libcudnn* /usr/local/cuda-12.0/lib64/
sudo chmod a+r /usr/local/cuda-12.0/include/cudnn.h
2. Install Python (Anaconda is recommended)
(1) How to install python on Windows
- Video tutorial: 3 How to create a pycharm environment with Anaconda
- Video tutorial: 4 How to use the python environment created by Anaconda in pycharm
- Refer to the installation tutorial: Anaconda super detailed installation tutorial (in Windows environment)
-
Conda usage tutorial: Create and use virtual environment with Conda in Windows
(2) How to install python on Ubuntu 18.04
- Refer to the installation tutorial: ubuntu 18.04 install conda environment and create virtual environment
(3) Python development tool (IDE): Pycharm is strongly recommended
PyCharm is a Python IDE built by JetBrains, which can support debugging, syntax highlighting, Project management, code jumping, smart prompts, auto-completion, unit testing, version control, etc.
3. Pytorch installation (required for getting started with deep learning)
PyTorch is an open source Python machine learning library based on Torch
- Official website: https://pytorch.org/
- Install: Previous PyTorch Versions | PyTorch
Please choose your own version to install, for example, if you install cuda=11.0, then install the corresponding version torch
pip install torch==1.7.1+cu110 torchvision==0.8.2+cu110 torchaudio==0.7.2 -f https://download.pytorch.org/whl/torch_stable.html
4. Install project dependencies
The python dependency package of the project, the installation method of Windows and Ubuntu is the same; the general project comes with a requirements.txt file, which contains the corresponding version number of the python dependency package required for project development, such as the first dependency package numpy== in the figure below 1.18.5, indicating that the project uses the numpy library, the corresponding version is 1.18.5, you can choose to use pip to install the corresponding version:
pip install numpy==1.18.5
# or
pip install numpy==1.18.5 -i https://pypi.tuna.tsinghua.edu.cn/simple
The URL after -i indicates the download address of the installation package. Domestic pip installation speed is slow. You can use -i to specify the mirror source to speed up the installation speed.
Other installation packages can also be installed one by one with pip, or directly install all dependent packages:
pip install -r requirements.txt
PS: Generally, dependent packages are backward compatible, you only need to install a version that is greater than or equal to the version number of requirements
5. Run the project Demo
The demo.py file that comes with the project generally supports the command line input of argparse, and the command line can be run in the terminal (Terminal).
The project will generally provide a Linux running script (bash), as shown in the figure below, if you are developing on Linux, you can directly copy and paste it to the terminal to run ( note that you need to cd into the project root directory, otherwise you cannot find the file )
# 测试图片
image_dir='data/test_image' # 测试图片的目录
weights="data/model/yolov5s_640/weights/best.pt" # 模型文件
out_dir="runs/test-result" # 保存检测结果
python demo.py --image_dir $image_dir --weights $weights --out_dir $out_dir
- In the bash script above, $image_dir, $weights and $out_dir all take the values of image_dir, weights and out_dir, which is the same as the python variable copy statement.
- But the Windows system does not support this variable assignment syntax! ! ! !
- If your Windows system has installed git, you can directly enter the above command in the git terminal without modification
If you are developing on Windows, please remove the variable and modify it to:
# 这种命令在Linux和Windows终端都支持,但语句比较长就不太美观了
python demo.py --image_dir 'data/test_image' --weights "data/model/yolov5s_640/weights/best.pt" --out_dir "runs/test-result"
6. Visualization tool: how to use TensorBoard
- Installation: Use pip to install dependent packages: tensorboard==2.5.0 and tensorboardX==2.1
- How to use: Enter the command in the terminal (Terminal):
# 需要安装tensorboard==2.5.0和tensorboardX==2.1
# 基本方法
tensorboard --logdir=path/to/log/
# 例如
tensorboard --logdir=work_space/mobilenet_v2_1.0_CrossEntropyLoss_20230228174645/log
- Click http://localhost:6006/ to open the browser
- You can see the visualization of the training process:
2. Common problems and solutions
(1) No module named problem
- If there is an error of "No module named ***", please use pip install ***, for example, the following error occurs
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'basetrainer'
Please use pip to install:
pip install basetrainer -i https://pypi.tuna.tsinghua.edu.cn/simple
The URL after -i indicates the download address of the installation package. Domestic pip installation speed is slow. You can use -i to specify the mirror source to speed up the installation speed.
(2) The directory path is filled in incorrectly
Beginners in Windows development, often make mistakes! ! ! ! When filling in the path, I don't know the path separator, pay attention to the path separator of Windows and Linux
Windows path separator: [\] or [//], some libraries in python can also use [/], so how to write it? In most cases, use the [/] separator! ! !
Linux (Ubuntu) path separation symbol: only [/]
(3) Chinese path problem (special attention)
In the Windows environment, there should be no Chinese path in the project file and data directory, otherwise there will be problems such as opencv reading pictures abnormally
(4) Picture format problem
By default, the project only supports two formats of *.jpg *.png. If your picture is bmp, tif or other pictures, please add the corresponding picture format in the postfix parameter.
For example, the setting function passes in the parameter postfix=["*.jpg", "*.png"] ,
If you don't know where to modify it, you can find it by searching postfix
(5) OSError: The paging file is too small to complete the operation
When your computer configuration is too poor, some errors may occur during training;
Description of the problem: This is because your computer performance is too poor and the memory is insufficient
Solution: Modify the training parameters workers and batch-size, for example, set --batch-size=8 --workers=0
batch_size: 8
num_workers: 0