Recently, I was reading a book, Mr. He Zhiyuan's "21 Projects to Play Deep Learning". It felt good, so I would like to take a note.
Minist handwriting recognition is also a tutorial for getting started with deep learning. This article has two parts, one is softamx regression, and the other is two-layer convolutional network classification.
- softmax regression
- Read MINIST data
# Import modules from tensorflow.examples.tutorials.mnist. This is a program that TensorFlow prepared in advance to teach MNIST from tensorflow.examples.tutorials.mnist import input_data # Read MNIST data from MNIST_data/. This statement will automatically execute the download when the data does not exist mnist = input_data.read_data_sets("MNIST_data/", one_hot=True) # Check the size of the training data print(mnist.train.images.shape) # (55000, 784) print(mnist.train.labels.shape) # (55000, 10) # Check the size of the validation data print(mnist.validation.images.shape) # (5000, 784) print(mnist.validation.labels.shape) # (5000, 10) # View the size of the test data print(mnist.test.images.shape) # (10000, 784) print(mnist.test.labels.shape) # (10000, 10) # print out the vector representation of the 0th image print(mnist.train.images[0, :]) # print out the label of the 0th image print(mnist.train.labels[0, :])
There are 60,000 images in the minis object, which are divided into 55,000 training images and 5,000 validation images in TensorFlow, and are divided into three parts: minist.train is the training set, minist.validation is the validation set, and minist.test is the test set .
- Save MINIST data as a picture
#coding: utf-8 from tensorflow.examples.tutorials.mnist import input_data import scipy.misc import them # Read the MNIST dataset. If it does not exist, it will be downloaded in advance. mnist = input_data.read_data_sets("MNIST_data/", one_hot=True) # We save the original image in the MNIST_data/raw/ folder # If this folder does not exist, it will be automatically created save_dir = 'MNIST_data/raw/' if os.path.exists(save_dir) is False: os.makedirs(save_dir) # save the first 20 images for i in range(20): # Please note that mnist.train.images[i, :] represents the i-th image (the serial number starts from 0) image_array = mnist.train.images[i, :] # The MNIST image in TensorFlow is a 784-dimensional vector, and we restore it to a 28x28-dimensional image. image_array = image_array.reshape(28, 28) # The format of the saved file is mnist_train_0.jpg, mnist_train_1.jpg, ... ,mnist_train_19.jpg filename = save_dir + 'mnist_train_%d.jpg' % i # save image_array as image # First use scipy.misc.toimage to convert to an image, and then call save to save it directly. scipy.misc.toimage(image_array, cmin=0.0, cmax=1.0).save(filename) print('Please check: %s ' % save_dir)
Each image is 28×28 in size, so a single sample has 784-dimensional data
- Image tag one-hot representation
# coding: utf-8 from tensorflow.examples.tutorials.mnist import input_data import numpy as np # Read the mnist dataset. If it does not exist, it will be downloaded in advance. mnist = input_data.read_data_sets("MNIST_data/", one_hot=True) # Look at the labels of the first 20 training images for i in range(20): # Get one-hot representation, like (0, 1, 0, 0, 0, 0, 0, 0, 0, 0) one_hot_label = mnist.train.labels[i, :] # Through np.argmax we can directly get the original label # Because only 1 bit is 1, the others are 0 label = np.argmax(one_hot_label) print('mnist_train_%d.jpg label: %d' % (i, label))
The so-called one-hot representation is "one-bit valid encoding". What does that mean? In this program, the image labels are 0~9, but we use a 10-dimensional vector to represent these 10 categories, and each category occupies a separate bit. At any time, only one bit of the one-hot representation is 1, and the others are 0. For example, 0 can be one-hot represented as (1,0,0,0,0,0,0,0,0,0), and 1 can be one-hot represented as (0,1,0,0,0,0,0, 0,0,0), 9 can be one-hot represented as (0,0,0,0,0,0,0,0,0,1). So N categories can be represented by N-dimensional arrays. - softmax regression
# coding:utf-8 # Import tensorflow. # This sentence import tensorflow as tf is a customary practice for importing TensorFlow, please remember. import tensorflow as tf # Import the modules taught by MNIST from tensorflow.examples.tutorials.mnist import input_data # As before, read in the MNIST data mnist = input_data.read_data_sets("MNIST_data/", one_hot=True) # Create x, where x is a placeholder representing the image to be recognized x = tf.placeholder(tf.float32, [None, 784]) # W is the parameter of the Softmax model, which converts a 784-dimensional input to a 10-dimensional output # In TensorFlow, the parameters of variables are represented by tf.Variable W = tf.Variable(tf.zeros([784, 10])) # b is another parameter of the Softmax model, which we generally call "bias". b = tf.Variable(tf.zeros([10])) # y=softmax(Wx + b), y represents the output of the model y = tf.nn.softmax(tf.matmul(x, W) + b) # y_ is the actual image label, again represented as a placeholder. y_ = tf.placeholder(tf.float32, [None, 10]) # At this point, we have two important Tensors: y and y_. # y is the output of the model, y_ is the actual image label, don't forget that y_ is the one-hot representation # Next we will construct the loss based on y and y_ # Construct cross entropy loss according to y, y_ cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y))) # With the loss, we can use stochastic gradient descent to optimize the model's parameters (W and b) train_step = tf.train.GradientDescentOptimizer(0.01).minimize(cross_entropy) # Create a Session. The optimization step train_step can only be run in a Session. sex = tf.InteractiveSession () # All variables must be initialized and memory allocated before running. tf.global_variables_initializer().run() print('start training...') # Do 1000 steps of gradient descent for _ in range(2000): # Take 100 training data in mnist.train # batch_xs is the image data of shape (100, 784), batch_ys is the actual label of shape (100, 10) # batch_xs, batch_ys corresponds to two placeholders x and y_ batch_xs, batch_ys = mnist.train.next_batch(100) # Run train_step in Session, pass in the value of the placeholder at runtime sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys}) # correct prediction result correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1)) # Calculate the prediction accuracy, they are all Tensors accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32)) # Run Tensor in Session to get the value of Tensor # Here is the correct rate to get the final model print(sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels})) # 0.9185
softmax is a linear multi-class classifier, derived from the logistic regression model, but the logistic regression model is a two-class model, and these two models will "score into reasonable probability values" for each category. For example, a sample may belong to three categories (a, b, c), and the softmax function can be used, and the corresponding value will be converted to (e^a/(e^a+e^b+e^c ), e^b/ (e^a+e^b+e^c ),e^c/(e^a+e^b+e^c )), that is: the probability that the sample belongs to the first class is e^a/(e ^a+e^b+e^c ), the probability of belonging to the second category is e^b/(e^a+e^b+e^c ), and the probability of belonging to the third category of samples is e^c/( e^a+e^b+e^c ), which sums exactly to 1.
Experience: 1) Use the tf.placeholder() function to create some placeholders, which are usually used to store sample data and labels, which can be regarded as formal parameters.
2) tf.Variable() creates some variables to store the parameters of the model
- Two-layer convolutional network classification
# coding: utf-8 import tensorflow as tf from tensorflow.examples.tutorials.mnist import input_data def weight_variable(shape): initial = tf.truncated_normal(shape, stddev=0.1) return tf.Variable(initial) def bias_variable(shape): initial = tf.constant(0.1, shape=shape) return tf.Variable(initial) def conv2d(x, W): return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME') def max_pool_2x2(x): return tf.nn.max_pool(x, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME') if __name__ == '__main__': # read data mnist = input_data.read_data_sets("MNIST_data/", one_hot=True) # x is the placeholder for the training image, y_ is the placeholder for the label of the training image x = tf.placeholder(tf.float32, [None, 784]) y_ = tf.placeholder(tf.float32, [None, 10]) # Restore a single image from a 784-dimensional vector to a 28x28 matrix image x_image = tf.reshape(x, [-1, 28, 28, 1]) # first convolutional layer W_conv1 = weight_variable([5, 5, 1, 32]) b_conv1 = bias_variable([32]) h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1) h_pool1 = max_pool_2x2(h_conv1) # Second convolutional layer W_conv2 = weight_variable([5, 5, 32, 64]) b_conv2 = bias_variable([64]) h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2) h_pool2 = max_pool_2x2(h_conv2) # Fully connected layer, the output is a 1024-dimensional vector W_fc1 = weight_variable([7 * 7 * 64, 1024]) b_fc1 = bias_variable([1024]) h_pool2_flat = tf.reshape(h_pool2, [-1, 7 * 7 * 64]) h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1) # With Dropout, keep_prob is a placeholder, 0.5 for training and 1 for testing keep_prob = tf.placeholder(tf.float32) h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob) # Convert the 1024-dimensional vector into 10-dimensional, corresponding to 10 categories W_fc2 = weight_variable([1024, 10]) b_fc2 = bias_variable([10]) y_conv = tf.matmul(h_fc1_drop, W_fc2) + b_fc2 # We do not use the method of first Softmax and then calculate the cross entropy, but directly use tf.nn.softmax_cross_entropy_with_logits to directly calculate cross_entropy = tf.reduce_mean( tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y_conv)) # Also define train_step train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy) # define the accuracy of the test correct_prediction = tf.equal(tf.argmax(y_conv, 1), tf.argmax(y_, 1)) accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32)) # Create Session and variable initialization sex = tf.InteractiveSession () sess.run(tf.global_variables_initializer()) # train 20000 steps for i in range(20000): batch = mnist.train.next_batch(50) # Report accuracy on validation set every 100 steps if i % 100 == 0: train_accuracy = accuracy.eval(feed_dict={ x: batch[0], y_: batch[1], keep_prob: 1.0}) print("step %d, training accuracy %g" % (i, train_accuracy)) train_step.run(feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5}) # report the accuracy on the test set after training print("test accuracy %g" % accuracy.eval(feed_dict={ x: mnist.test.images, y_: mnist.test.labels, keep_prob: 1.0}))
Four functions are defined: the weight_variable function returns a variable of a given shape and is automatically initialized with a truncated normal distribution, and bias_variable returns a variable of a given shape, their initial value is 1, and these two functions are used to create the convolution kernel. with bias.