Machine Learning Series - Part 1 - Perceptron Recognition of Handwritten Digits (Mnist Example Analysis)

Series catalog:

  1. Machine Learning Series - Part 0 - Development Tools and tensorflow Environment Construction  
  2. Machine Learning Series - Part 1 - Perceptron Recognition of Handwritten Digits (Mnist Example Analysis)
  3. Machine Learning Series - Part 2 - CNN Recognition of Handwritten Digits (Mnist Example Analysis)


      Part 0 has already set up the development environment. This article introduces the process of recognizing handwritten digits (mnist example) with perceptron in detail. I hope that everyone can clearly understand the mnist example and practice it according to the steps in this article. Before you continue reading, it is best that you already know the following knowledge points (or you will come back sooner or later):

  1. Vector and matrix basic operation rules
  2. tensorflow basics (session, graph, tensor, etc.)
  3. Perceptron
  4. cross entropy
  5. Minimum Gradient Descent Algorithm


A picture and data analysis

The mnist example will automatically download the following data


train-*.gz is the data used for training, t10k-*gz is the data used for testing, these data are not original pictures, but processed into secondary files, here we focus on analyzing this data Format.

1 image file

      The picture of handwritten numbers is a 28*28 grayscale picture. The value range of each pixel in the picture is 0-255 (black is 0, white is 255). The picture file is written in this format:

 Magic value (32 bits) + number of pictures (32 bits) + picture width (32 bits) + picture length (32 bits) + all picture data

(1) Magic value: file ID, the magic value of the train-images-idx3-ubyte file is 2051

(2) All graph data: single graph data 28*28=784 uint8, so all graph N is N*784 uint8

View the image file in hexadecimal, and the result is shown in the following figure:


2 label file

The label file records the actual value of the image corresponding to the order of the image, the range is 0-9. The format of the file is:

Magic value (32 bits) + number of tags (32 bits) + all tag data

(1) The magic value of the train-labels-idx1-ubyte file is 2049

(2) All tag data: each tag is a uint8, so all tags are N uint8

View the label file in hexadecimal, and the result is shown in the following figure:


As you can see from the above, the picture data starts after ea60, the first 4 pictures are 5 0 4 1 respectively, I restore them to pictures:   

The code to restore the image is as follows:


Through the analysis and restoration of the file data, we can clearly know what the data format is, so that we will not be confused when dealing with the data during the coding process. At the same time, we will use our own handwritten fonts to verify the accuracy of the model later in the article. sex.


Two mnist analysis

1. File import

mnist_reader is written by me to read handwritten fonts made by myself (the code is given in the third part):



2. Training process


(1) Read data

   The one_hot parameter is True means that the two-dimensional array of 28*28 in the picture is processed into a one-dimensional array [784]. After this processing, all pixels are a feature input, and finally it is only to analyze the impact of pixels on the prediction results. , the structural information of the image is lost. Since this is just an exercise, it's good to know what the problem is, and I won't discuss the pros and cons of this method for now.

(2) Model construction

    Why is the weight an array of [784*10], because the picture data is an array of 784, and each picture may be one of the ten numbers 0-9, and the predicted result of each picture has 10 possibilities , corresponding to the probability value of this picture being 0, 1, ... 9. Similarly, the bias b is also the same as the dimension of the actual output.

   The x variable is used to store a batch of images in the batch training process, so it does not specify the number of rows and allows the system to automatically derive it.

   The y_ variable is used to store a batch of label data in the batch training process, which corresponds to x one by one

   The y variable is the output of our model function f(x)=wx+b processed by the softmax function

(3) Initialize session

  tensorflow variables need to be initialized before they can be used.

(4) Model training

Each batch of 100 images is used for 1000 training cycles. There is no regulation on how to determine the size of each batch. If it is too small, the prediction results of the trained model will be unsatisfactory. If it is too large, it will take a long time to train. You need to determine the batch size and the number of cycles according to the number of samples you have. The main consideration is how to reasonably set the size of these two values. When the number of samples is limited, try to avoid the problem that the gradient descent algorithm may find the local minimum rather than the global minimum.

(5) Model evaluation

The rule of model evaluation is to compare the corresponding number with the highest probability of the predicted value with the actual value, and then count the proportion of normal predictions. For ease of understanding, assume that 9 samples are trained, and the output results of the following code:

correct_prediction=tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1)) ->

           [ True True True False False True False False False]

tf.reduce_mean(tf.cast(correct_prediction, "float")) -> 0.44444445

According to the test sample, the accuracy of the sample can reach about 91%:

training prediction: 0.9138 

training done

Then, by continuously adjusting the batch size and the number of cycles, it is found that the highest accuracy rate is basically between 91% and 92%, so it can be basically determined that the limit accuracy rate of the model is about 92%.

(6) Model save

A model saves the following 4 files:


checkpoint: record which models are in the current directory

mnist.data-00000-of-00001: All parameter values ​​for the model (like w and b)

mnist.index:

mnist.meta: the graph structure information of the model

3. Model use

1. Make your own pictures of handwritten numbers

According to the picture of the final output of mnist in the first part, it is found that it is a black background, that is, most of the pixels are 0, the part of the number is white, and the pixel value is basically 128-255, so refer to it to make a picture. , in order to use the trained model to recognize digits. Before my own learning process, according to the example in the tensorflow Chinese community http://www.tensorfly.cn/tfdoc/tutorials/mnist_download.html, it is black characters on a white background, and the handwritten characters cannot be recognized at all.

Correct, just follow the following two steps to make the same picture as the mnist example:



Finally, save it as a png image.

The following are pictures I made, 9 numbers from 0 to 9. The name rule is number + serial number, that is, the first character of the name identifies what number is written in the picture. The advantage is that there is no need to write a label file. When reading the picture The label can be automatically parsed out:

         



2. Load image data

The point of loading image data is to understand the format of the data in mnist. I won't explain this. You can debug it yourself and see the memory data. The code to read the image is as follows:


3. Identify

(1) Use the trained model


(2) You can test all without specifying specify. I test this separately for each number to see the respective accuracy rates.


(3) Results

(4) Analysis of results

Seeing the above prediction results, are you surprised or surprised? A far cry from the 91% accuracy on training evaluation. What is the reason? Looking back at the handwritten picture I made above, you will find that it is very different from the data that comes with the example. When I made it, I deliberately wrote the numbers in different sizes and positions.

Look again at the point (1) "Reading data" in the "mnist analysis" section, convert the picture data into a [784] array, lose the original structure information of the picture, and simply analyze the value of each pixel to the picture. When there is a big difference in the size and position of the written numbers in the 28*28 size area, it must not be recognized.

4. Summary

So far, we have completed the training and prediction process of the entire perceptron to recognize handwritten digits. During this process, for a novice, the code for training and testing is not difficult. The difficulty is to understand the basis behind it. Knowledge points listed. If you have fully understood the meaning of each line of code in this article, then congratulations, the Hello World of machine learning has been completed.


Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325340306&siteId=291194637