Azure Kinect DK implements multi-frame environment reconstruction (Open3D)

  There will be certain information distortion and loss in the image scanned by a single frame of the Azure Kinect DK camera, so I want to do a multi-frame reconstruction to try the effect.

non-real-time reconstruction

  The main reference here is this article: "Azure Kinect DK realizes 3D reconstruction (PC non-real-time version)" .
  The code given by the blogger lucky li can run through, and the effect is not bad. The process of the entire code is roughly to record a video of the reconstructed object, and then decompose it frame by frame to obtain a color image and a depth image, and then perform feature matching between each frame, and finally reconstruct the point cloud. Some adjustments have been made to the code of the blogger here, because the code visualization window for recording video only displays color images and depth images, and the information collected by the Kinect camera will be distorted or missing when the distance is too close. Using the display point cloud is convenient for judging whether the camera distance is suitable.
  First copy the following pykinect_azure folder in GitHub to the open3d_reconstruction/sensor folder of the blogger code: https://github.com/struggler176393/Kinect_pointcloud
  open3d_reconstruction/sensors/azure_kinect_recorder.py is modified as follows:

import sys
sys.path.insert(1, './')

import argparse
import datetime
import open3d as o3d
import numpy as np
import pykinect_azure as pykinect
from pykinect_azure.utils import Open3dVisualizer

class RecorderWithCallback:

    def __init__(self, config, device, filename, align_depth_to_color):
        # Global flags
        self.flag_exit = False
        self.flag_record = False
        self.filename = filename
        self.open3dVisualizer = Open3dVisualizer()

        self.align_depth_to_color = align_depth_to_color
        self.recorder = o3d.io.AzureKinectRecorder(config, device)
        if not self.recorder.init_sensor():
            raise RuntimeError('Failed to connect to sensor')

    def escape_callback(self, vis):
        self.flag_exit = True
        if self.recorder.is_record_created():
            print('Recording finished.')
        else:
            print('Nothing has been recorded.')
        return False

    def space_callback(self, vis):
        if self.flag_record:
            print('Recording paused. '
                  'Press [Space] to continue. '
                  'Press [ESC] to save and exit.')
            self.flag_record = False

        elif not self.recorder.is_record_created():
            if self.recorder.open_record(self.filename):
                print('Recording started. '
                      'Press [SPACE] to pause. '
                      'Press [ESC] to save and exit.')
                self.flag_record = True

        else:
            print('Recording resumed, video may be discontinuous. '
                  'Press [SPACE] to pause. '
                  'Press [ESC] to save and exit.')
            self.flag_record = True

        return False

    def run(self):
        glfw_key_escape = 256
        glfw_key_space = 32
        self.open3dVisualizer.vis.register_key_callback(glfw_key_escape, self.escape_callback)
        self.open3dVisualizer.vis.register_key_callback(glfw_key_space, self.space_callback)
        print("Recorder initialized. Press [SPACE] to start. "
              "Press [ESC] to save and exit.")

        vis_geometry_added = False
        while not self.flag_exit:
            self.rgbd = self.recorder.record_frame(self.flag_record,
                                              self.align_depth_to_color)
            if self.rgbd is None:
                continue

            color_image = self.rgbd.color
            depth_image = self.rgbd.depth

            rgbd_image = o3d.geometry.RGBDImage.create_from_color_and_depth(
                color_image, depth_image,
                convert_rgb_to_intensity=False)
            intrinsic = o3d.camera.PinholeCameraIntrinsic(
                o3d.camera.PinholeCameraIntrinsicParameters.PrimeSenseDefault )
            intrinsic.set_intrinsics(
                width=1280, height=720, fx=605.805115, fy=605.625549, cx=641.717163, cy=363.225800)

            self.point_cloud = o3d.geometry.PointCloud.create_from_rgbd_image(rgbd_image, intrinsic)

            self.open3dVisualizer(self.point_cloud.points,self.point_cloud.colors)

        self.recorder.close_record()


if __name__ == '__main__':
    parser = argparse.ArgumentParser(description='Azure kinect mkv recorder.')
    parser.add_argument('--config', type=str, help='input json kinect config')
    parser.add_argument('--output', type=str, help='output mkv filename')
    parser.add_argument('--list',
                        action='store_true',
                        help='list available azure kinect sensors')
    parser.add_argument('--device',
                        type=int,
                        default=0,
                        help='input kinect device id')
    parser.add_argument('-a',
                        '--align_depth_to_color',
                        action='store_false',
                        help='enable align depth image to color')
    args = parser.parse_args()

    if args.list:
        o3d.io.AzureKinectSensor.list_devices()
        exit()

    if args.config is not None:
        config = o3d.io.read_azure_kinect_sensor_config(args.config)
    else:
        config = o3d.io.AzureKinectSensorConfig()

    if args.output is not None:
        filename = args.output
    else:
        filename = '{date:%Y-%m-%d-%H-%M-%S}.mkv'.format(
            date=datetime.datetime.now())
    print('Prepare writing to {}'.format(filename))

    device = args.device
    if device < 0 or device > 255:
        print('Unsupported device id, fall back to 0')
        device = 0

    r = RecorderWithCallback(config, device, filename,
                             args.align_depth_to_color)
    r.run()

  Since the size of the object I want to reconstruct is relatively small, I need to modify python sensors/azure_kinect_mkv_reader.py --input dataset/name.mkv --output dataset/namethe open3d_reconstruction/dataset/name/config.json file obtained after running, reduce tsdf_cubic_sizethe value of the parameter (find it after looking for a long time), and finally get the scene/integrated.ply file The higher the density of the point cloud, try to tsdf_cubic_sizetake 0.5 and 0.3 to see the effect. When the value is 0.2 and below, the computer will not be able to run and kill the process.
  Some run time data (recorded a 7 second video):

- Making fragments    0:06:34.716390
- Register fragments  0:00:00.016120
- Refine registration 0:00:00.015653

tsdf_cubic_size=0.5

Integrate frames    0:00:56.776975

Point cloud result:
insert image description here
insert image description here

tsdf_cubic_size=0.3

Integrate frames    0:05:38.702845

Point cloud result:
insert image description hereinsert image description here

  • The results obtained by the two parameters are basically the same, but the reconstruction is still time-consuming, and it is difficult to meet the real-time requirements. Since there are about 71 frames of RGB images and depth images involved in the reconstruction, if you want to speed up the speed, you can reduce the Try frame rate. In addition, multi-frame reconstruction actually seems to sacrifice a certain degree of accuracy. Here is a single-frame scanned point cloud image:
    insert image description hereinsert image description here
      In addition to the lack of depth information in some places, the single-frame point cloud image is still better in terms of accuracy. Multi-frame reconstruction should be better. Suitable for large environment reconstruction.

real-time reconstruction

  The main reference here is this article: "Azure Kinect DK realizes 3D reconstruction (jetson real-time version)" .
  After the ROS workspace is built and compiled, you can see the result:
insert image description here
  basically the approximate object can be reconstructed, but I don’t know why there are a lot of noise points diverging from the camera position, which makes the point cloud a bit ugly, and it can be denoised later. one time.

Guess you like

Origin blog.csdn.net/astruggler/article/details/128659593