There will be certain information distortion and loss in the image scanned by a single frame of the Azure Kinect DK camera, so I want to do a multi-frame reconstruction to try the effect.
non-real-time reconstruction
The main reference here is this article: "Azure Kinect DK realizes 3D reconstruction (PC non-real-time version)" .
The code given by the blogger lucky li can run through, and the effect is not bad. The process of the entire code is roughly to record a video of the reconstructed object, and then decompose it frame by frame to obtain a color image and a depth image, and then perform feature matching between each frame, and finally reconstruct the point cloud. Some adjustments have been made to the code of the blogger here, because the code visualization window for recording video only displays color images and depth images, and the information collected by the Kinect camera will be distorted or missing when the distance is too close. Using the display point cloud is convenient for judging whether the camera distance is suitable.
First copy the following pykinect_azure folder in GitHub to the open3d_reconstruction/sensor folder of the blogger code: https://github.com/struggler176393/Kinect_pointcloud
open3d_reconstruction/sensors/azure_kinect_recorder.py is modified as follows:
import sys
sys.path.insert(1, './')
import argparse
import datetime
import open3d as o3d
import numpy as np
import pykinect_azure as pykinect
from pykinect_azure.utils import Open3dVisualizer
class RecorderWithCallback:
def __init__(self, config, device, filename, align_depth_to_color):
# Global flags
self.flag_exit = False
self.flag_record = False
self.filename = filename
self.open3dVisualizer = Open3dVisualizer()
self.align_depth_to_color = align_depth_to_color
self.recorder = o3d.io.AzureKinectRecorder(config, device)
if not self.recorder.init_sensor():
raise RuntimeError('Failed to connect to sensor')
def escape_callback(self, vis):
self.flag_exit = True
if self.recorder.is_record_created():
print('Recording finished.')
else:
print('Nothing has been recorded.')
return False
def space_callback(self, vis):
if self.flag_record:
print('Recording paused. '
'Press [Space] to continue. '
'Press [ESC] to save and exit.')
self.flag_record = False
elif not self.recorder.is_record_created():
if self.recorder.open_record(self.filename):
print('Recording started. '
'Press [SPACE] to pause. '
'Press [ESC] to save and exit.')
self.flag_record = True
else:
print('Recording resumed, video may be discontinuous. '
'Press [SPACE] to pause. '
'Press [ESC] to save and exit.')
self.flag_record = True
return False
def run(self):
glfw_key_escape = 256
glfw_key_space = 32
self.open3dVisualizer.vis.register_key_callback(glfw_key_escape, self.escape_callback)
self.open3dVisualizer.vis.register_key_callback(glfw_key_space, self.space_callback)
print("Recorder initialized. Press [SPACE] to start. "
"Press [ESC] to save and exit.")
vis_geometry_added = False
while not self.flag_exit:
self.rgbd = self.recorder.record_frame(self.flag_record,
self.align_depth_to_color)
if self.rgbd is None:
continue
color_image = self.rgbd.color
depth_image = self.rgbd.depth
rgbd_image = o3d.geometry.RGBDImage.create_from_color_and_depth(
color_image, depth_image,
convert_rgb_to_intensity=False)
intrinsic = o3d.camera.PinholeCameraIntrinsic(
o3d.camera.PinholeCameraIntrinsicParameters.PrimeSenseDefault )
intrinsic.set_intrinsics(
width=1280, height=720, fx=605.805115, fy=605.625549, cx=641.717163, cy=363.225800)
self.point_cloud = o3d.geometry.PointCloud.create_from_rgbd_image(rgbd_image, intrinsic)
self.open3dVisualizer(self.point_cloud.points,self.point_cloud.colors)
self.recorder.close_record()
if __name__ == '__main__':
parser = argparse.ArgumentParser(description='Azure kinect mkv recorder.')
parser.add_argument('--config', type=str, help='input json kinect config')
parser.add_argument('--output', type=str, help='output mkv filename')
parser.add_argument('--list',
action='store_true',
help='list available azure kinect sensors')
parser.add_argument('--device',
type=int,
default=0,
help='input kinect device id')
parser.add_argument('-a',
'--align_depth_to_color',
action='store_false',
help='enable align depth image to color')
args = parser.parse_args()
if args.list:
o3d.io.AzureKinectSensor.list_devices()
exit()
if args.config is not None:
config = o3d.io.read_azure_kinect_sensor_config(args.config)
else:
config = o3d.io.AzureKinectSensorConfig()
if args.output is not None:
filename = args.output
else:
filename = '{date:%Y-%m-%d-%H-%M-%S}.mkv'.format(
date=datetime.datetime.now())
print('Prepare writing to {}'.format(filename))
device = args.device
if device < 0 or device > 255:
print('Unsupported device id, fall back to 0')
device = 0
r = RecorderWithCallback(config, device, filename,
args.align_depth_to_color)
r.run()
Since the size of the object I want to reconstruct is relatively small, I need to modify python sensors/azure_kinect_mkv_reader.py --input dataset/name.mkv --output dataset/name
the open3d_reconstruction/dataset/name/config.json file obtained after running, reduce tsdf_cubic_size
the value of the parameter (find it after looking for a long time), and finally get the scene/integrated.ply file The higher the density of the point cloud, try to tsdf_cubic_size
take 0.5 and 0.3 to see the effect. When the value is 0.2 and below, the computer will not be able to run and kill the process.
Some run time data (recorded a 7 second video):
- Making fragments 0:06:34.716390
- Register fragments 0:00:00.016120
- Refine registration 0:00:00.015653
tsdf_cubic_size=0.5
Integrate frames 0:00:56.776975
Point cloud result:
tsdf_cubic_size=0.3
Integrate frames 0:05:38.702845
Point cloud result:
- The results obtained by the two parameters are basically the same, but the reconstruction is still time-consuming, and it is difficult to meet the real-time requirements. Since there are about 71 frames of RGB images and depth images involved in the reconstruction, if you want to speed up the speed, you can reduce the Try frame rate. In addition, multi-frame reconstruction actually seems to sacrifice a certain degree of accuracy. Here is a single-frame scanned point cloud image:
In addition to the lack of depth information in some places, the single-frame point cloud image is still better in terms of accuracy. Multi-frame reconstruction should be better. Suitable for large environment reconstruction.
real-time reconstruction
The main reference here is this article: "Azure Kinect DK realizes 3D reconstruction (jetson real-time version)" .
After the ROS workspace is built and compiled, you can see the result:
basically the approximate object can be reconstructed, but I don’t know why there are a lot of noise points diverging from the camera position, which makes the point cloud a bit ugly, and it can be denoised later. one time.