【ZJU-Machine Learning】Convolutional Neural Network-LeNet

The concept of convolutional neural network

From manually designing convolution kernels to automatically learning convolution kernels.
What is a convolution kernel?
We learned many transformations in "Signals and Systems", such as wavelet transform, Fourier transform, etc.

For Fourier transform:
Insert image description here
For the convolution kernel in Fourier transform, its function is to act with the f(t) signal (this function is to multiply it first and then add it up)

And we learn these transformations in order to artificially find a convolution kernel.
For image processing, in order to combine the convolution kernel with the image to generate a feature, we use multiple convolution kernels to extract multiple features.

The so-called convolution is to first extract features from the graphics, output the features of the image, and then input these features into the neural network (fully connected layer)

the term

Insert image description here

The relationship between step size and feature map size

Insert image description here

Insert image description here

zero padding

For a part of the step size (usually greater than 1), the convolution kernel may not be able to traverse its edge part, causing it to be unable to participate in the operation. We pad the edge part with zeros to prevent wasting pixels.
Insert image description here

weight sharing

Image convolution can be regarded as weight sharing of a fully connected network.

Insert image description here
Insert image description here
The above convolution operation is equivalent to the following weight sharing network:
Insert image description here

Insert image description here

LeNet

Insert image description here
first step

Pay attention to
Insert image description here
the second step of nonlinear transformation (Relu)
, average the 2*2 range, and then perform Relu transformation.

When backpropagating, just take 1/4 of the partial derivative of the parameters and fill it in the previous neuron.

third step

Use 16 convolution kernels of 5 * 5 * 6, Stride=1, and apply them to the feature map of 14 * 14 * 6, resulting in 16 feature maps of 10 * 10

Step 4
: Average

Step 5:
Input the above 16 * 5 * 5 into the fully connected layer.

Insert image description here

It can be seen that the training speed of the entire network depends on the convolution layer ( time complexity ), and the number of parameters depends on the fully connected layer ( space complexity ).

Note: After all linear transformations, a ReLu must be followed

Tensorflow implements LENET-5

Insert image description here
Layer 1 (CONV1) and Layer 2 (AVG_POOL1)

sess = tf.InteractiveSession()
x = tf.placeholder(float, shape=[None, 784])
y_ = tf.placeholder(float, shape=[None, 10])

W_conv1 = weight_variable([5, 5, 1, 6])
b_conv1 = bias_variable([6])

x_image = tf.reshape(x, [-1,28,28,1])
h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1,’SAME’) + b_conv1)
h_pool1 = average_pool_2x2(h_conv1)
def conv2d(x, W, padding_method='SAME'):
	 return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding=padding_method)
def avg_pool_2x2(x, padding_method='SAME'):
  return tf.nn.avg_pool(x, ksize=[1, 2, 2, 1],
     strides=[1, 2, 2, 1], padding= padding_method)

Layer 3 (CONV2) and Layer 4 (AVG_POOL2)

W_conv2 = weight_variable([5, 5, 6, 16])
b_conv2 = bias_variable([16])

h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
h_pool2 = avg_pool_2x2(h_conv2)

Three fully connected layers

W_fc1 = weight_variable([5 * 5 * 16, 120])
b_fc1 = bias_variable([120])

h_pool2_flat = tf.reshape(h_pool2, [-1, 5*5*16])
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)

keep_prob = tf.placeholder("float")
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)
W_fc2 = weight_variable([120, 84])
b_fc2 = bias_variable([84])
h_fc2 =tf.nn.relu(tf.matmul(h_fc1_drop, W_fc2) + b_fc2)
h_fc2_drop = tf.nn.dropout(h_fc2, keep_prob)

W_fc3 = weight_variable([84, 10])
b_fc3 = bias_variable([10])

y_conv=tf.nn.softmax(tf.matmul(h_fc2_drop, W_fc3) + b_fc3)
cross_entropy = -tf.reduce_sum(y_*tf.log(y_conv))
train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)correct_prediction = tf.equal(tf.argmax(y_conv,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))

sess.run(tf.global_variables_initializer())

for i in range(10000):
	batch = mnist.train.next_batch(50)
	if i%100 == 0:
	    train_accuracy = accuracy.eval(feed_dict={
    
    x:batch[0], y_: batch[1], keep_prob: 1.0})
	    print "step %d, training accuracy %g"%(i, train_accuracy)
	    train_step.run(feed_dict={
    
    x: batch[0], y_: batch[1], keep_prob: 0.5})
	    print "test accuracy %g"%accuracy.eval(feed_dict={
    
    
	    x: mnist.test.images, y_: mnist.test.labels, keep_prob: 1.0})

Caffe implements LENET-5

There are three ways to use Caffe to train neural networks:

  • command line interface
  • Python interface
  • Matlab interface

  • There is a Caffe.cpp in the tools folder of the command line interface Caffe, which has written the necessary processes such as updating parameters and saving the model during training. After compiling caffe, we only need to call this executable file and specify the training solver during training.

File structure

1)create_lmdb.sh

2)compute_mean.sh

3)train_lenet.sh

4)lenet_solver.prototxt

5)lenet_train_test.prototxt

6)test_lenet.sh

Main code implementation

  1. create_lmdb.sh

DATA=/home/hty/caffe-master/examples/mnist
BUILD=/home/hty/caffe-master/build/tools
 
rm -rf $DATA/mnist_train_lmdb
rm -rf $DATA/mnist_test_lmdb
 
$BUILD/convert_imageset --shuffle \
--resize_height=28 --resize_width=28 \
$DATA/    \
$DATA/training.txt  $DATA/mnist_train_lmdb
 
$BUILD/convert_imageset --shuffle \
--resize_height=28 --resize_width=28 \
$DATA/    \
$DATA/testing.txt  $DATA/mnist_test_lmdb
  1. compute_mean.sh

#!/usr/bin/env sh
# This script converts the mnist data into lmdb/leveldb format,
# depending on the value assigned to $BACKEND.
set -e
 
DATA=/home/hty/caffe-master/examples/mnist
BUILD=/home/hty/caffe-master/build/tools
 
rm -rf $DATA/mean.binaryproto
 
$BUILD/compute_image_mean $DATA/mnist_train_lmdb $DATA/mean.binaryproto $@

  1. train_lenet.sh
#!/usr/bin/env sh
set -e
 
BUILD=/home/hty/caffe-master/build/tools
DATA=/home/hty/caffe-master/examples/mnist
$BUILD/caffe train --solver=$DATA/lenet_solver.prototxt $@

  1. lenet_solver.prototxt
net: "examples/mnist/lenet_train_test.prototxt"
# test_iter specifies how many forward passes the test should carry out.
# In the case of MNIST, we have test batch size 100 and 100 test iterations,
# covering the full 10,000 testing images.
test_iter: 100
# Carry out testing every 500 training iterations.
test_interval: 500
# The base learning rate, momentum and the weight decay of the network.
base_lr: 0.01
momentum: 0.0
weight_decay: 0.0005
# The learning rate policy
lr_policy: "inv"
gamma: 0.0001
power: 0.75
# Display every 100 iterations
display: 100
# The maximum number of iterations
max_iter: 10000
# snapshot intermediate results
snapshot: 5000
snapshot_prefix: "examples/mnist/lenet_rmsprop"
# solver mode: CPU or GPU
solver_mode: GPU
type: "RMSProp"
rms_decay: 0.98

Introduction to Lr_policy

fixed: always return base_lr.  
step: return base_lr * gamma ^ (floor(iter / step))  
exp: return base_lr * gamma ^ iter  
inv: return base_lr * (1 + gamma * iter) ^ (- power)  
multistep: similar to step but it allows non uniform steps defined by  stepvalue  
poly: the effective learning rate follows a polynomial decay, to be  zero by the max_iter. return base_lr (1 - iter/max_iter) ^ (power)  
sigmoid: the effective learning rate follows a sigmod decay  return base_lr ( 1/(1 + exp(-gamma * (iter - stepsize))))  
where base_lr, max_iter, gamma, step, stepvalue and power are defined  in the solver parameter protocol buffer, and iter is the current iteration. 
  1. lenet_train_test.prototxt
layer {
    
    
  name: "conv1"
  type: "Convolution"
  bottom: "data"
  top: "conv1“
 param {
    
    
    lr_mult: 1
  }
  param {
    
    
    lr_mult: 2
  }
  convolution_param {
    
    
    num_output: 6
    kernel_size: 5
    stride: 1
    weight_filler {
    
    
      type: "xavier"
    }
    bias_filler {
    
    
      type: "constant"
    }
  }
}
layer {
    
    
 name: "pool1“
   type: "Pooling“
   bottom: "conv1“
   top: "pool1“
   pooling_param {
    
    
   pool: MAX
   kernel_size: 2
   stride: 2
 }
}
layer {
    
    
  name: "loss"
  type: "SoftmaxWithLoss"
  bottom: "ip2"
  bottom: "label"
  top: "loss"
  include {
    
    
    phase: TRAIN
  }
}

6.test_lenet.sh


#!/usr/bin/env sh
set -e
 
BUILD=/home/hty/caffe-master/build/tools
DATA=/home/hty/caffe-master/examples/mnist
$BUILD/caffe test -model $DATA/lenet_train_test.prototxt -weights $DATA/lenet_iter_10000.caffemodel -iterations 100 $@

Advantages and Disadvantages of Caffe

Advantages of Caffe

  • Very suitable for convolutional neural network for image recognition
  • There are many pre-trained models
  • Less code
  • There are relatively few packages, and the source program is easy to understand and modify.
  • The trained parameters can be easily exported to other program files (such as C language)
  • Suitable for industrial applications

Disadvantages of Caffe

  • Because it was developed specifically for convolutional neural networks, the structure is inflexible and difficult to carry out other applications.
  • The code writing method is relatively rigid, and each layer must be written.
  • All details cannot be fully adjusted without modifying the source code.

Guess you like

Origin blog.csdn.net/qq_45654306/article/details/113395281