Google cloab runs using the official source code of pointsift

pointSIFT official GitHub address

This article is used to record the problems encountered when running the source code, and the source code and point cloud deep learning related codes such as pointnet should be analyzed later.

The official source code uses the TensorFlow 1.4.1 version. I also tried to use TensorFlow 2.x to run it, but it was a lot of trouble. In the end, there were still functions that needed to be converted from 1.4 to 2.x and there was no one-to-one correspondence. I finally gave up up. Here we use the Google Colab method, so the environment should be consistent, and it can be reproduced step by step according to the steps.

Colab address

1. Create a new notebook and select the GPU mode
2. Colab does not directly support the version of TensorFlow1.x and needs to be set manually (it may not be possible to set it manually later)

%tensorflow_version 1.x

This line of code needs to be specified at the beginning after running to succeed.
insert image description here

3. Clone the code from GitHub

!git clone https://github.com/MVIG-SJTU/pointSIFT

insert image description here
4. Set the environment variable. This should be copied from other papers in the series. It’s okay if you don’t set it. If it doesn’t work, add it.

!TF_INC=$(python -c 'import tensorflow as tf; print(tf.sysconfig.get_include())')
!TF_LIB=$(python -c 'import tensorflow as tf; print(tf.sysconfig.get_lib())')

5. View the TensorFlow path

import tensorflow as tf
# include path
print(tf.sysconfig.get_include())
# library path 
print(tf.sysconfig.get_lib())

insert image description here
6. Enter the source code and modify the files ending with sh in the four folders under the source code/content/pointSIFT/tf_utils/tf_ops folder (script file)
tf_grouping_compile.sh

#/bin/bash
/usr/local/cuda-10.0/bin/nvcc tf_grouping_g.cu -o tf_grouping_g.cu.o -c -O2 -DGOOGLE_CUDA=1 -x cu -Xcompiler -fPIC

# TF1.2
#g++ -std=c++11 tf_grouping.cpp tf_grouping_g.cu.o -o tf_grouping_so.so -shared -fPIC -I /usr/local/lib/python2.7/dist-packages/tensorflow/include -I /usr/local/cuda-8.0/include -lcudart -L /usr/local/cuda-8.0/lib64/ -O2 -D_GLIBCXX_USE_CXX11_ABI=0

# TF1.4
g++ -std=c++11 tf_grouping.cpp tf_grouping_g.cu.o -o tf_grouping_so.so -shared -fPIC -I /tensorflow-1.15.2/python3.7/tensorflow_core/include -I /usr/local/cuda-8.0/include -I /tensorflow-1.15.2/python3.7/tensorflow_core/include/external/nsync/public -lcudart -L /usr/local/cuda-10.0/lib64/ -L/tensorflow-1.15.2/python3.7/tensorflow_core -ltensorflow_framework -O2 -D_GLIBCXX_USE_CXX11_ABI=0

The main point of the modification is to modify cuda8.0 to 10.0, and you can follow the path /usr/local/ to find out what version of cuda Colab has. Another modification is to change the path of TensorFlow to Colab, pay attention! Some are include, some are not include, and some include is followed by a string, please pay attention to the modification. There is also a reference article pointSIFT early adopters . It is reminded again that the sh files under the four folders all need to be modified.

7. Modify the .so file. This step is also a big pit. In the corresponding directory, there is a libtensorflow_framework.so.1 (if it is TensorFlow2, it is so.2) but what needs to be used is .so (without numbers after it), my solution is to copy One out and one in.

%cd /tensorflow-1.15.2/python3.7/tensorflow_core
!cp libtensorflow_framework.so.1 libtensorflow_framework.so
!ls

insert image description here
8. Set environment variables (can be done or not)

!export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:"/tensorflow-1.15.2/python3.7/tensorflow_core/libtensorflow_framework.so"

Put the .so just now into the environment variable, which was put in when stepping on the pit. In theory, there will be no pitfalls if you copy the .so.

9. Run all .sh files

%cd /content/pointSIFT/tf_utils/tf_ops/pointSIFT_op
!sh /content/pointSIFT/tf_utils/tf_ops/pointSIFT_op/tf_pointSIFT_compile.sh
%cd /content/pointSIFT/tf_utils/tf_ops/interpolation
!sh /content/pointSIFT/tf_utils/tf_ops/interpolation/tf_interpolate_compile.sh
%cd /content/pointSIFT/tf_utils/tf_ops/grouping/
!sh /content/pointSIFT/tf_utils/tf_ops/grouping/tf_grouping_compile.sh
%cd /content/pointSIFT/tf_utils/tf_ops/sampling
!chmod +x tf_sampling_compile.sh
!./tf_sampling_compile.sh

Note that if the sixth step is not modified, it will report an error such as non-existence or error exit, as shown in the figure below

/content/pointSIFT/tf_utils/tf_ops/pointSIFT_op
/content/pointSIFT/tf_utils/tf_ops/pointSIFT_op/tf_pointSIFT_compile.sh: 2: /content/pointSIFT/tf_utils/tf_ops/pointSIFT_op/tf_pointSIFT_compile.sh: /usr/local/cuda-8.0/bin/nvcc: not found
g++: error: pointSIFT_g.cu.o: No such file or directory

Please check the relevant documents carefully.

There will be a warning if it runs successfully, but it doesn't affect it. The successful result is as follows

/content/pointSIFT/tf_utils/tf_ops/pointSIFT_op
main.cpp: In lambda function:
main.cpp:22:48: warning: ignoring return value of ‘tensorflow::Status tensorflow::shape_inference::InferenceContext::WithRank(tensorflow::shape_inference::ShapeHandle, tensorflow::int64, tensorflow::shape_inference::ShapeHandle*)’, declared with attribute warn_unused_result [-Wunused-result]
             c->WithRank(c->input(0), 3, &dims1);
                                                ^
In file included from main.cpp:8:0:
/tensorflow-1.15.2/python3.7/tensorflow_core/include/tensorflow/core/framework/shape_inference.h:394:10: note: declared here
   Status WithRank(ShapeHandle shape, int64 rank,
          ^~~~~~~~
main.cpp:24:48: warning: ignoring return value of ‘tensorflow::Status tensorflow::shape_inference::InferenceContext::WithRank(tensorflow::shape_inference::ShapeHandle, tensorflow::int64, tensorflow::shape_inference::ShapeHandle*)’, declared with attribute warn_unused_result [-Wunused-result]
             c->WithRank(c->input(1), 3, &dims2);
                                                ^
In file included from main.cpp:8:0:
/tensorflow-1.15.2/python3.7/tensorflow_core/include/tensorflow/core/framework/shape_inference.h:394:10: note: declared here
   Status WithRank(ShapeHandle shape, int64 rank,
          ^~~~~~~~
main.cpp: In lambda function:
main.cpp:35:47: warning: ignoring return value of ‘tensorflow::Status tensorflow::shape_inference::InferenceContext::WithRank(tensorflow::shape_inference::ShapeHandle, tensorflow::int64, tensorflow::shape_inference::ShapeHandle*)’, declared with attribute warn_unused_result [-Wunused-result]
             c->WithRank(c->input(0), 3, &dim1); // batch_size * npoint * 3
                                               ^
In file included from main.cpp:8:0:
/tensorflow-1.15.2/python3.7/tensorflow_core/include/tensorflow/core/framework/shape_inference.h:394:10: note: declared here
   Status WithRank(ShapeHandle shape, int64 rank,
          ^~~~~~~~
main.cpp: In lambda function:
main.cpp:46:47: warning: ignoring return value of ‘tensorflow::Status tensorflow::shape_inference::InferenceContext::WithRank(tensorflow::shape_inference::ShapeHandle, tensorflow::int64, tensorflow::shape_inference::ShapeHandle*)’, declared with attribute warn_unused_result [-Wunused-result]
             c->WithRank(c->input(0), 3, &dim1); // batch_size * npoint * 3
                                               ^
In file included from main.cpp:8:0:
/tensorflow-1.15.2/python3.7/tensorflow_core/include/tensorflow/core/framework/shape_inference.h:394:10: note: declared here
   Status WithRank(ShapeHandle shape, int64 rank,
          ^~~~~~~~
main.cpp: In lambda function:
main.cpp:57:47: warning: ignoring return value of ‘tensorflow::Status tensorflow::shape_inference::InferenceContext::WithRank(tensorflow::shape_inference::ShapeHandle, tensorflow::int64, tensorflow::shape_inference::ShapeHandle*)’, declared with attribute warn_unused_result [-Wunused-result]
             c->WithRank(c->input(0), 3, &dim1); // batch_size * npoint * 3
                                               ^
In file included from main.cpp:8:0:
/tensorflow-1.15.2/python3.7/tensorflow_core/include/tensorflow/core/framework/shape_inference.h:394:10: note: declared here
   Status WithRank(ShapeHandle shape, int64 rank,
          ^~~~~~~~

If there is an error, a large part of the reason is that the path of the sh file has not been modified, and the .so has not been copied.

10. Download the dataset

%cd /content
!wget --no-check-certificate https://shapenet.cs.stanford.edu/media/scannet_data_pointnet2.zip

insert image description here

11. Unzip the dataset

%cd /content/pointSIFT/
!unzip /content/scannet_data_pointnet2.zip

insert image description here

12. Run the code

%cd /content/pointSIFT
!python train_and_eval_scannet.py --batch_size 8
/content/pointSIFT
train size 1201 and test size 312
WARNING:tensorflow:From /content/pointSIFT/models/pointSIFT_pointnet.py:10: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.

WARNING:tensorflow:From train_and_eval_scannet.py:164: The name tf.train.AdamOptimizer is deprecated. Please use tf.compat.v1.train.AdamOptimizer instead.

WARNING:tensorflow:From train_and_eval_scannet.py:94: The name tf.train.exponential_decay is deprecated. Please use tf.compat.v1.train.exponential_decay instead.

WARNING:tensorflow:From train_and_eval_scannet.py:100: The name tf.summary.scalar is deprecated. Please use tf.compat.v1.summary.scalar instead.

WARNING:tensorflow:From /content/pointSIFT/tf_utils/pointSIFT_util.py:98: The name tf.variable_scope is deprecated. Please use tf.compat.v1.variable_scope instead.

WARNING:tensorflow:
The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
  * https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.

WARNING:tensorflow:From /content/pointSIFT/tf_utils/tf_util.py:21: The name tf.get_variable is deprecated. Please use tf.compat.v1.get_variable instead.

WARNING:tensorflow:From /content/pointSIFT/tf_utils/pointSIFT_util.py:252: calling reduce_max_v1 (from tensorflow.python.ops.math_ops) with keep_dims is deprecated and will be removed in a future version.
Instructions for updating:
keep_dims is deprecated, use keepdims instead
WARNING:tensorflow:From /tensorflow-1.15.2/python3.7/tensorflow_core/python/util/deprecation.py:574: calling conv1d (from tensorflow.python.ops.nn_ops) with data_format=NHWC is deprecated and will be removed in a future version.
Instructions for updating:
`NHWC` for data_format is deprecated, use `NWC` instead
WARNING:tensorflow:From /content/pointSIFT/tf_utils/pointSIFT_util.py:341: calling reduce_sum_v1 (from tensorflow.python.ops.math_ops) with keep_dims is deprecated and will be removed in a future version.
Instructions for updating:
keep_dims is deprecated, use keepdims instead
WARNING:tensorflow:From /content/pointSIFT/tf_utils/tf_util.py:613: calling dropout (from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a future version.
Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.
WARNING:tensorflow:From train_and_eval_scannet.py:197: The name tf.get_variable_scope is deprecated. Please use tf.compat.v1.get_variable_scope instead.

build graph in gpu 0
WARNING:tensorflow:From /content/pointSIFT/models/pointSIFT_pointnet.py:72: The name tf.losses.sparse_softmax_cross_entropy is deprecated. Please use tf.compat.v1.losses.sparse_softmax_cross_entropy instead.

WARNING:tensorflow:From /tensorflow-1.15.2/python3.7/tensorflow_core/python/ops/losses/losses_impl.py:121: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
WARNING:tensorflow:From /content/pointSIFT/models/pointSIFT_pointnet.py:74: The name tf.add_to_collection is deprecated. Please use tf.compat.v1.add_to_collection instead.

WARNING:tensorflow:From train_and_eval_scannet.py:179: The name tf.get_collection is deprecated. Please use tf.compat.v1.get_collection instead.

WARNING:tensorflow:From train_and_eval_scannet.py:216: The name tf.summary.merge_all is deprecated. Please use tf.compat.v1.summary.merge_all instead.

WARNING:tensorflow:From train_and_eval_scannet.py:218: The name tf.train.Saver is deprecated. Please use tf.compat.v1.train.Saver instead.

WARNING:tensorflow:From train_and_eval_scannet.py:221: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.

WARNING:tensorflow:From train_and_eval_scannet.py:226: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.

2021-11-03 03:02:46.702606: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2299995000 Hz
2021-11-03 03:02:46.704836: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x560f203f4840 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2021-11-03 03:02:46.704873: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2021-11-03 03:02:46.761618: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2021-11-03 03:02:46.974400: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-11-03 03:02:46.975314: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x560f203f52c0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2021-11-03 03:02:46.975351: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Tesla K80, Compute Capability 3.7
2021-11-03 03:02:46.975731: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-11-03 03:02:46.976454: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Found device 0 with properties: 
name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235
pciBusID: 0000:00:04.0
2021-11-03 03:02:46.983310: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2021-11-03 03:02:47.178342: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2021-11-03 03:02:47.203608: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2021-11-03 03:02:47.262938: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2021-11-03 03:02:47.487159: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2021-11-03 03:02:47.521078: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2021-11-03 03:02:47.987856: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2021-11-03 03:02:47.988116: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-11-03 03:02:47.989027: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-11-03 03:02:47.989755: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1767] Adding visible gpu devices: 0
2021-11-03 03:02:47.990029: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2021-11-03 03:02:47.991877: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1180] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-11-03 03:02:47.991915: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1186]      0 
2021-11-03 03:02:47.991945: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1199] 0:   N 
2021-11-03 03:02:47.992562: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-11-03 03:02:47.993393: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-11-03 03:02:47.996088: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1325] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10813 MB memory) -> physical GPU (device: 0, name: Tesla K80, pci bus id: 0000:00:04.0, compute capability: 3.7)
WARNING:tensorflow:From train_and_eval_scannet.py:227: The name tf.summary.FileWriter is deprecated. Please use tf.compat.v1.summary.FileWriter instead.

WARNING:tensorflow:From train_and_eval_scannet.py:229: The name tf.global_variables_initializer is deprecated. Please use tf.compat.v1.global_variables_initializer instead.

2021-11-03 03:02:57.247497: W tensorflow/core/framework/cpu_allocator_impl.cc:81] Allocation of 16777216 exceeds 10% of system memory.
2021-11-03 03:02:57.281148: W tensorflow/core/framework/cpu_allocator_impl.cc:81] Allocation of 16777216 exceeds 10% of system memory.
2021-11-03 03:02:57.297586: W tensorflow/core/framework/cpu_allocator_impl.cc:81] Allocation of 16777216 exceeds 10% of system memory.
2021-11-03 03:02:57.313636: W tensorflow/core/framework/cpu_allocator_impl.cc:81] Allocation of 16777216 exceeds 10% of system memory.
2021-11-03 03:02:57.330606: W tensorflow/core/framework/cpu_allocator_impl.cc:81] Allocation of 16777216 exceeds 10% of system memory.
2021-11-03 03:03:29.105629: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2021-11-03 03:03:35.225050: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
epoch 1 , loss is 16.904939 take 320.285 s
epoch 2 , loss is 14.630943 take 280.282 s

It's not finished, it will take a long time to finish

Conclusion: There are really many pitfalls. It is really troublesome to be incompatible after the TensorFlow version is changed. Pointnet has a pytorch version, which is very convenient to reproduce. I uploaded the code and notebook that can be run directly after modification in this article to GitHub. If you are using Colab, you should be able to run it directly.
My GitHub repository

Guess you like

Origin blog.csdn.net/rglkt/article/details/121114296