Several suggestions for training the NeRF model

Original: instant-ngp/nerf_dataset_tips

For students who are not familiar with NeRF, it is recommended to learn NeRF-based 3D content generation first

The initial camera parameters we need to implement are provided in transforms.json, and the format is compatible with NeRF: Neural Radiance Fields . To this end, we provide scripts/colmap2nerf.py to facilitate these tasks. It can be used to process video or sequence pictures, and obtain necessary data based on the open source COLMAP motion acquisition method.

The training process is very picky about the data. In order to obtain good results, it cannot contain mislabeled data and cannot contain blurred frames (no motion blur or out-of-focus blur). This article tries to give some suggestions. A good guideline is if in If your model has not converged within 20 seconds, then even training for a longer time will not produce better results. Therefore we recommend tweaking the data at the beginning to get satisfactory results. Most convergence occurs within the first few seconds.

The most common problem with datasets is incorrect scale or offset of the camera position. See below for more details. Another common problem is too few images, or images with inaccurate camera parameters (for example, if COLMAP fails). In this case, you may need to acquire more images, or adjust the calculation process of the camera position, which (colmap fails) is beyond the scope of this article.

Existing dataset

By default , instant-ngp  's NeRF implementation only raysteps through unit bounding boxes from [0,0,0] to [1,1,1]. By default, the data loader reads the camera transformation matrix in the input JSON file and scales the position by 0.33 and the offset [0.5,0.5,0.5] in order to map the source of the input data to the center of this cube. The scaling factors were chosen to fit the synthetic dataset in the original NerF paper, as well as the output of our script/colmap2nerf.py script.

It is necessary to check the alignment of the camera to this bounding box by checking both "Visualize Camera" and "Visualize Unit Cube" in the "Debug Visualization" summary of the UI, as follows:

For natural scenes where the background is visible outside the unit cube, it is necessary to set the parameters in the file to powers of aabb_scale2 to 128 (i.e. 1, 2, 4, 8...128). See data/nerf/fox/transforms.json for an example .

The effect can be seen in the image below:

The camera is still somewhat centered on the "object of interest" within the unit cube; however, the aabb_scale parameter (here set to 16) causes the NeRF implementation to raytrace to a larger bounding box (side length 16 ), which contains the centered background element.

Augment existing datasets

If you already have a transforms.json dataset, it should be centered and in a similar scale to the original NeRF synthetic dataset. When you load it into NGP, if you find that it is not converging, the first thing to check is the position of the camera relative to the unit cube, using the above debug function. If the dataset doesn't primarily fall in the unit cube, it's worth moving it to that unit cube. You can do this by tweaking the transform itself, or by adding global parameters to the outer scope of the json.

You can set any of the following parameters, the values ​​listed are default values.

{
	"aabb_scale": 16,
	"scale": 0.33,
	"offset": [0.5, 0.5, 0.5],
	...	
}

See nerf_loader.cu for implementation details and other options .

Prepare a new NeRF dataset

Make sure you have  COLMAP installed and available in your PATH. If you're using video files as input, make sure  FFmpeg is installed  and available in your PATH. To check if this is the case, from a terminal window, you should be able to run and see some help text for each window.colmapffmpeg -?

If training from a video file, run  the scripts/colmap2nerf.py script from the folder containing the video with the following suggested parameters  :

data-folder$ python [path-to-instant-ngp]/scripts/colmap2nerf.py --video_in <filename of video> --video_fps 2 --run_colmap --aabb_scale 16

The above assumes a single video file as input and then extracts frames at a specified frame rate (2). It is recommended to choose a frame rate that produces approximately 50-150 images. So for a one-minute video, --video_fps 2yes.

For training from images, put them in a folder called images and use the appropriate options like this:

data-folder$ python [path-to-instant-ngp]/scripts/colmap2nerf.py --colmap_matcher exhaustive --run_colmap --aabb_scale 16

The script will run FFmpeg and/or COLMAP as necessary, then perform transforms.jsonthe conversion step to the desired format, which will write to the current directory.

By default, the script calls colmap with a "sequential matcher" that works on images taken from smoothly varying camera paths, as in video. An exhaustive matcher is more appropriate if the images are not in a particular order, as in the image example above. For more options, run scripts are available. For more advanced uses of COLMAP or challenging scenarios, please refer to the COLMAP documentation ; you may need to modify the script/colmap2nerf.py script itself.

aabb_scaleParameters are instant-ngpthe most important specific parameters. It specifies the extent of the scene, which defaults to 1; that is, the scene is scaled so that the average distance of the camera position from the origin is 1 unit. For small synthetic scenarios (such as the original NeRF dataset), aabb_scalethe default value of 1 is ideal, resulting in the fastest training. The NeRF model assumes that the training images are fully explainable by the scene contained in this bounding box. However, for natural scenes where the background exceeds this bounding box, the NeRF model will struggle and may produce hallucinatory "floaters" at the bounds of the box. By setting it aabb_scaleto a larger power of 2 (up to a power of 16), the NeRF model will expand the rays to a larger bounding box. Note that this may slightly affect training speed. When in doubt, for natural scenes, start with 16 and reduce it as much as possible. The value can be edited directly in the output file transforms.jsonwithout re-running  the scripts/colmap2nerf.py  script. ​​​​​​​​

Assuming success, you can now train a NeRF model as follows, starting from the folder:

instant-ngp$ ./build/testbed --mode nerf --scene [path to training data folder containing transforms.json]

NeRF Training Data Hints

The NeRF model is best suited for 50-150 images that exhibit minimal scene motion, motion blur, or other blurring artifacts. The quality of the reconstruction depends on COLMAP being able to extract accurate camera parameters from the image. See the previous section for how to verify this.

The script colmap2nerf.pyassumes that the training images all point approximately to a shared point of interest, which is located at the origin. The point is found by taking a weighted average of the closest proximity points between rays for the center pixel of all training image pairs. In practice, this means that the script works best when the training images are captured pointing at the object of interest, although they don't need to complete the full 360-degree view. As above, if set aabb_scaleto a number greater than 1, any visible background behind the object of interest will still be reconstructed.

Guess you like

Origin blog.csdn.net/minstyrain/article/details/124904045