Start using 3D images now

1. Description

        This story introduces the use of this type of data to train machine learning 3D models. In particular, we discussed the 3D version of the MNIST dataset available in Kaggle, and how to use Keras to train a model to recognize 3D digits.

        3D data is everywhere. Since we want to build AI to interact with our physical world, it makes perfect sense to use 3D data to train our models.

2. Where does the 3D data come from?

        Now look at the objects around you. They are 3D entities that occupy a 3D room, and you - also a 3D entity - are here and now. If everything in this room is static, we can model this environment as 3D spatial data.

                                                                Building Scans —  Sources

        3D data comes from various sources, such as 2D image sequences and 3D scanner data. In this story, we set out to process 3D data using a synthetically generated 3D version of the point cloud from the popular MNIST dataset.

3. 3D MNIST dataset

        In case you didn't already know, MNIST is a famous collection of 2D images of handwritten digits. Elements in MNIST are small 28x28 grayscale images. In this story, we will use the 3D version of MNIST :

Raw digits in MNIST

                                                        Modified 3D version

This dataset         can be generated using this jupyter notebook .

        The 3D images in Augmented MNIST 3D are obtained from the original 2D images in MNIST modified by a set of transformations:

1 -  Dilation : This is  the process of stacking  the same digital image N times to obtain a 2D body from a 3D digital.

                                        Magnified version of handwritten digits 3

2 - Noise: apply significant Gaussian noise to each 3D point

                                                        Same image with Gaussian noise

3 -  Colorization : Registers in MNIST are grayscale images. To make things more challenging, let's convert them to include random colors

4 - Rotation: Once they are 3D objects we can rotate them, this is what we do

                                        same image with different rotation

        More details on the 3D MNIST dataset can be found in Kaggle. Now, let's jump straight to the step-by-step process:

4. Get and load data

        First things first: download the dataset file from Kaggle . Unzip the file to get  3d-mnist.h5 . Then, load the dataset

        In short, each register in train_x or test_x is a 16x16x16 cube. Each cube holds a 3D digital point cloud data. You can easily extract any register from the dataset:

        The result is as follows:

In fact, this is an enhanced 181D version of the 3rd element in MNIST:

Now that we have loaded the dataset, we can use it to train our model.

4.1 Define the model

        We wish to train a model to recognize 3D representations of numbers in cubes. The model used to recognize handwritten digits in the canonical 2D version of MNIST is not suitable for the 3D dataset version. Therefore, in order to process 3D data, it is necessary to use 3D transformations, such as convolutional 3D and 3D max pooling. In fact, Keras supports this type of filter.

Defining a 3D model to handle our 3D data is really simple:

        It's a very simple model, but it does the job. Remember, you can get the full source code here .

4.2 Training model

Let's train the model using stochastic gradient descent. Feel free to use another optimizer of your liking ( adam , RMSProp , etc):

model = define_model()
model.compile(loss=tensorflow.keras.losses.categorical_crossentropy,optimizer=tensorflow.keras.optimizers.SGD(learning_rate=0.01, momentum=0.9), metrics=['accuracy'])
history = model.fit(train_X_3D, train_y, batch_size=32, epochs=4, verbose=1, validation_split=0.2)

I just ran this code and this is my output:

4.3 Training Results

        This is our first trial. After only 4 epochs, we achieved 96.34% accuracy on the validation set! Of course, this performance can be better understood with a proper analysis of the confusion matrix. But, at least on the first run, these results are encouraging!

        Note that the validation loss has been decreasing for 4 epochs. Apparently, this train finished earlier than necessary. Next time, we might set a higher number of epochs and use more detailed stopping conditions.

Let's see how it performs on the test data!

4.4 Evaluation model

        Here's how we'll check performance:

score = model.evaluate(test_X_3D, test_y, verbose=0)
print('Test accuracy: %.2f%% Test loss: %.3f' % (score[1]*100, score[0])) 

        Here are our current results:

        I have to say I was really surprised. This simple model achieves good performance even when the data is barely modified by noise, rotation, and random colormaps.

        Also, given the amount of data and not using the GPU, the training speed is too fast! cold!

        We can tune hyperparameters and train optimizers to get better results easily. However, high performance is not our goal.

        We learned how to use 3D convolutions, now we know how to create a simple but powerful CNN network to process our 3D data.

Five, the next step is

The next step is to train the model to recognize events in 3D data generated from a time series of 4D images. Stay tuned!

Guess you like

Origin blog.csdn.net/gongdiwudu/article/details/132178916