K210 study notes (12) - MaixHub local training model (Windows)


foreword

Although MaixHub can train models online, the size of the data set is limited to less than 20M, and most of the time you need to queue up to train the model, which may not meet our needs for the model, so I build the environment under the Windows system and train the model locally . For local training pure novices, although MaixHub does not require a virtual machine for local training, it is still necessary to strictly follow the steps of the tutorial step by step. Otherwise it is easy to make mistakes.

1. Environment configuration

1. Install python3.8

It is recommended to install python3.8 here. Due to some unknown problems of python3.9, the environment configuration may fail. Click to download python3.8 . Double-click to open the python3.8 installation package, check as shown in the figure, and then Install Now. insert image description here
After the installation is complete, win+r opens the command window, enter Python to check whether the installation is successful, and if this is displayed, the installation is successful.
insert image description here

2. Install pip

Because pip has been installed by default when installing python3.8, so there is no need to install pip separately here, you can check the packages installed on python through pip list. This warning is to remind that the current pip is not the latest version, so you can ignore it.
insert image description here

3. Install CUDA10.1

Open the link to download cuda10.1 , select the corresponding system version and download method
insert image description here
, download the downloaded installation package, open it directly, and then just keep clicking Next.
Note: If the computer has other versions of CUDA, you can refer to this method to choose to specify whether the two paths have been included in the system variables after the installation of "Windows One Graphics Card Configure Multiple CUDA Versions" .

insert image description here

4. Install CUDNN

Click here to enter the official download website of cudnn, select the version of cudnn v7.6.5.32 for cuda 10.1 to download, insert image description here
after downloading, you will get a compressed package of cudnn-10.1-windows10-x64-v7.6.5.32.zip. unzip. Get three folders insert image description here
and copy them to C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1 file. At this time, the CUDA environment has been configured.

5. tensorflow installation

tensorflow-gpu version 2.3.0 is a bit strict for cuda version, cuda version must be version 10.1, cudnn version 10.1 V7.**, otherwise GPU cannot be used for training.
win+r, enter cmd, open the command line input

pip install tensorflow-gpu==2.3.0

If downloading is slow, enter

pip install tensorflow-gpu==2.3.0 -i https://pypi.mirrors.ustc.edu.cn/simple

6. MaixHub local training code download

Go here to download the local training code. After entering the connection, you can download it through the git command, or click Download ZIP to download the compressed package. Unzip the compressed package, any location is fine, as long as you remember where you unzipped it. Then download ncc-win7-x86_64 and unzip it, you will get a folder called ncc-win7-x86_64, change the name of this folder to ncc_v0.1. Then copy this folder to the maix_train/tools/ncc folder. (If there is no ncc folder, create one, and the path must be correct)

ncc path creation

2. Local training steps

1. Install dependencies

Open the folder obtained after decompression, open the requirements.txt file inside, delete tensorflow>=2.3.1 inside, save and close.
win+r, enter cmd, enter the folder save path and enter

pip install -r requirements.txt

If the download speed is very slow, you can use the source of the University of Science and Technology of China to download

pip install -r requirements.txt -i https://mirrors.aliyun.com/pypi/simple/

2. Dataset preparation

The preparation of the data set is similar to the online training of MaixHub. For details, please refer to "K210 Study Notes (11) - MaixHub Online Training Model (Online Alchemy)"

3. Start training

Initialize first

python train.py init

Put the dataset into the datasets folder in the local training source code, and
classify the training input

python train.py -t classifier -z datasets/test_classifier_datasets.zip train

If it is an uncompressed folder, enter

python train.py -t classifier -d datasets/test_classifier_datasets train

Object detection input

python train.py -t detector -z datasets/test_detector_xml_format.zip train

Note: In the command entered here, the name of the dataset you made is added after datasets/.
Like my dataset name is datasets.zip, then I start target detection training code as:

python train.py -t detector -z datasets/datasets.zip train

After entering the code, you can start training. After training, you will get an out folder, and the files inside are the models obtained after training.

3. Problems that may arise during training

1. Version errorinsert image description here

This is caused by the version of the package not matching, just download the corresponding version, for example:

pip install numpy==1.19.0
pip install tifffile==2021.6.14
pip install imageio==2.9.0

2. Insufficient video memoryinsert image description here

This error occurs

2022-04-21 16:50:38,364 - [ERROR]: failed: TrainFailReason.ERROR_INTERNAL, error occurred when train, error: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.

The reason is due to insufficient video memory, you can directly add the following code to the file to let tensorflow automatically allocate video memory (my computer is too slow, so I report this error)
in D:\maix_train\maix\train\classifier (training code storage path ) found in __init__.py insert image description here
Add the code after the import statement:

from tensorflow.compat.v1 import ConfigProto
from tensorflow.compat.v1 import InteractiveSession
config = ConfigProto()
config.gpu_options.allow_growth = True
session = InteractiveSession(config=config)

Find it in __init__.py in D:\maix_train\maix\train\detector insert image description here
Add the code after the import statement:

from tensorflow.compat.v1 import ConfigProto
from tensorflow.compat.v1 import InteractiveSession
config = ConfigProto()
config.gpu_options.allow_growth = True
session = tf.compat.v1.Session(config=config)

Run the training task again, and the problem is solved.

3. No compressed package is generated after training

This reason is because the path of maix_train/tools/ncc/ncc_v0.1 is not set, or the download is not ncc-win7-x86_64.

4. Determine whether GPU is being used for training

1. Start the training to check whether the information is displayed, and it is the same as the box below.
insert image description here
If not, it means that your previous cuda and cudnn environment has not been installed. Please install all the software drivers about NVIDIA Uninstall, yes uninstall! ! ! The file is not deleted. Then re-configure the environment of cuda and cudnn.

2. Check in the task manager to see if the video memory of the GPU is used, not the utilization rate of the graphics card

3. The no GPU that appeared at the beginning of the training, will use CPU is just a prompt message, and it is not unused.
4. Check if the notebook has taken off (manual dog head).

5. Internal: no kernel image is available for execution on the device appears during training

The environment needs to be reinstalled, and the installed version of tensorflow is not correct

6.出现failed: TrainFailReason.ERROR_PARAM, datasets not valid: datasets format error: datasets error, not support format, please check

This kind of production does not have strict installation data set requirements, just check your folder name, it can be solved, especially the images folder, it is easy to lose an s.


Summarize

If MaixHub local training does not strictly follow the steps, it is still easy to make mistakes. For example, I was stuck for a long time when installing CUDA. The reason is that I changed the installation path, which caused the installation path to be automatically deleted by the computer. After the nth time Keep clicking next and there will be no such problem. After learning here, I already have a preliminary understanding of MAIX BIT (K210). It should be no problem to use it for competitions and final design. I will update the study notes of MAIX BIT (K210) later. It may be the local training model of MaixHub (Linux ) and some actual combat projects with k210 (however, the student party has one, and there is no ticket to buy accessories).

Guess you like

Origin blog.csdn.net/Thousand_drive/article/details/124276265