Errors encountered in deep learning project deployment [record]

Errors encountered in deep learning project deployment:

1. Reappearance of Votenet paper project (the code was recently updated on 2/6/2020)
using pycharm

1、unsupported Microsoft Visual Studio version! Only the versions be tween 2013 and 2017 (inclusive) are supported!错误返回值2

Solution: I installed VS2019 in my computer environment. This project has a lot of c++ files that need to be compiled (Compile the CUDA layers for PointNet++, which we used in the backbone network), and requires the 2013-2017 version of VS support, so uninstall and reinstall In 2017, it was compiled and passed.
(It is an error reported by executing the python setup.py install command under pointnet2)

Successful operation will display as follows:

正在创建库 build\temp.win-amd64-3.7\Release\_ext_src/src\_ext.lib 和对象 build\temp.win-amd64-3.7\Release\_ext_src/src\_ext.exp
正在生成代码
已完成代码的生成
creating build\bdist.win-amd64
creating build\bdist.win-amd64\egg
creating build\bdist.win-amd64\egg\pointnet2
copying build\lib.win-amd64-3.7\pointnet2\_ext.pyd -> build\bdist.win-amd64\egg\pointnet2
creating stub loader for pointnet2\_ext.pyd
byte-compiling build\bdist.win-amd64\egg\pointnet2\_ext.py to _ext.cpython-37.pyc
creating build\bdist.win-amd64\egg\EGG-INFO
copying pointnet2.egg-info\PKG-INFO -> build\bdist.win-amd64\egg\EGG-INFO
copying pointnet2.egg-info\SOURCES.txt -> build\bdist.win-amd64\egg\EGG-INFO
copying pointnet2.egg-info\dependency_links.txt -> build\bdist.win-amd64\egg\EGG-INFO
copying pointnet2.egg-info\top_level.txt -> build\bdist.win-amd64\egg\EGG-INFO
writing build\bdist.win-amd64\egg\EGG-INFO\native_libs.txt
zip_safe flag not set; analyzing archive contents...
pointnet2.__pycache__._ext.cpython-37: module references __file__
creating dist
creating 'dist\pointnet2-0.0.0-py3.7-win-amd64.egg' and adding 'build\bdist.win-amd64\egg' to it
removing 'build\bdist.win-amd64\egg' (and everything under it)
Processing pointnet2-0.0.0-py3.7-win-amd64.egg
creating w:\conda\envs\votenet\lib\site-packages\pointnet2-0.0.0-py3.7-win-amd64.egg
Extracting pointnet2-0.0.0-py3.7-win-amd64.egg to w:\conda\envs\votenet\lib\site-packages
Adding pointnet2 0.0.0 to easy-install.pth file

Installed w:\conda\envs\votenet\lib\site-packages\pointnet2-0.0.0-py3.7-win-amd64.egg
Processing dependencies for pointnet2==0.0.0
Finished processing dependencies for pointnet2==0.0.0

Generate some compiled files

2. CUDA_VISIBLE_DEVICES=0 : The item "CUDA_VISIBLE_DEVICES=0" cannot be recognized as the name of a cmdlet, function, script file, or executable program. Please check the spelling of the name, and if a path is included, make sure it is correct and try again.
Location line: 1 Character: 1
The specific error is as follows:

CUDA_VISIBLE_DEVICES=0 : 无法将“CUDA_VISIBLE_DEVICES=0”项识别为 cmdlet、函数、脚本文件或可运行程序的名称。请检查名称的拼写,如果包括路径,请确保路径正确,然后再试一次。
所在位置 行:1 字符: 1
+ CUDA_VISIBLE_DEVICES=0 python train.py --dataset sunrgbd --log_dir lo ...
+ ~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : ObjectNotFound: (CUDA_VISIBLE_DEVICES=0:String) [], CommandNotFoundException
    + FullyQualifiedErrorId : CommandNotFoundException

After downloading the SUN RGB-D dataset, execute the command according to the steps: CUDA_VISIBLE_DEVICES=0 python train.py --dataset sunrgbd --log_dir log_sunrgbd The error reported

First refer to other blog posts and try to assign the gpu specified statement part CUDA_VISIBLE_DEVICES=0 to the system environment variable, which is useless for me
,,,,,, 2022.7.1
and then try to delete this code and run python train directly .py --dataset sunrgbd --log_dir log_sunrgbd , the report is wrong:

PS W:\pycharmprogram\votenet-main> python train.py --dataset sunrgbd --log_dir log_sunrgbd
Traceback (most recent call last):
  File "train.py", line 40, in <module>
    from tf_visualizer import Visualizer as TfVisualizer
  File "W:\pycharmprogram\votenet-main\utils\tf_visualizer.py", line 12, in <module>
    import tf_logger
  File "W:\pycharmprogram\votenet-main\utils\tf_logger.py", line 6, in <module>
    import tensorflow as tf
ModuleNotFoundError: No module named 'tensorflow'
PS W:\pycharmprogram\votenet-main> 

The wrong version is installed, change the GPU version

3. The paper code format has been updated

WARNING:tensorflow:From W:\pycharmprogram\votenet-main\utils\tf_logger.py:19: The name tf.summary.FileWriter is deprecated. Please use tf.compat.v1.summary.FileWriter instead.

WARNING:tensorflow:From W:\pycharmprogram\votenet-main\utils\tf_logger.py:19: The name tf.summary.FileWriter is deprecated. Please use tf.compat.v1.summary.FileWriter instead.

4. Insufficient video memory

RuntimeError: CUDA out of memory. Tried to allocate 512.00 MiB (GPU 0; 4.00 GiB total capacity; 1.54 GiB already allocated; 400.76 MiB free; 12.59 MiB cached)

RuntimeError: CUDA out of memory. Tried to allocate 512.00 MiB (GPU 0; 4.00 GiB total capacity; 1.54 GiB already allocated; 400.76 MiB free; 12.59 MiB cached)

11111111111111111111
insert image description here

1111111111111111111111111111
insert image description here
upgrade version is enough

insert image description here
missing data

The data folder is missing

Problems encountered by IA-SSD:

(iassd) root@container-b8cd11b252-a9f12073:~/autodl-tmp/IA-SSD# git clone https://github.com/yifanzhang713/spconv1.0.git
Cloning into 'spconv1.0'...
fatal: unable to access 'https://github.com/yifanzhang713/spconv1.0.git/': gnutls_handshake() failed: The TLS connection was non-properly terminated.

The first step: apt-get update
The second step: apt-get install curl
on it.

A small problem in parsing the data, some data in the middle has no problem after
insert image description here
/// test.py

insert image description here

numba.cuda.cudadrv.error.NvvmSupportError: No supported GPU compute capabilities found. Please check your cudatoolkit version matches your CUDA version.

insert image description here
insert image description here

Regarding the solution to this problem:
First, the GPU can be called normally during training, indicating that the installation of cuda10.0 and pytorch1.1 is no problem, but it prompts that there is a problem with the cudatoolkit version, which is puzzling. Locating the wrong folder is an error reported in the numba file. Entering nvvm.py, it is found that the minimum cuda version 10.2 is required
and the server is cuda10.0. It cannot be moved, so the numba version is lowered
insert image description here

numba official dependency collocation
insert image description here

Run the verified test.py and get the test data

2022-09-24 21:19:44,252 INFO Result is save to /root/autodl-tmp/output/kitti_models/IA-SSD/default/eval/epoch_no_number/val/default
2022-09-24 21:19:44,252 INFO Evaluation done.*

train.py training screenshot
insert image description here
insert image description here
insert image description here
insert image description here
insert image description here

Guess you like

Origin blog.csdn.net/qq_44114055/article/details/125434986