Deep learning and computer vision practical learning (2) - using caffe to implement a neural network

Implement a neural network using caffe

Purpose: Classify linearly inseparable data in a plane

Environment: Install Caffe 

Caffe is a framework with many dependencies. After installing the dependency packages, install Caffe again.

First go to the directory where you want to install Caffe and execute git clone https://github.com/BVLC/caffe.git

Then go to the Caffe directory, find the Makefile.config.example file and copy it, cp Makefile.config.example Makefile.config

Similar to MXNet, Makefile.config is a compiled configuration file. In this file, you can configure some compilation options. Generally speaking, it mainly configures CUDA, cuDNN, and blas libraries.

By default CPU_ONLY:=1 is commented out. Unless you are installing Caffe on a GPU machine without NVIDIA, just uncomment it.

There is also an option WITH_PYTHON_LAYER:=1, which means that it supports using Python to define layers in neural networks, which is somewhat similar to the way layers are defined in MXNet. If you want to use some python layers or some Caffe versions with specific functions, such as Ross Girshick's py-faster-rcnn, then you need to uncomment this item.

After configuring Makefile.config, you can start compilation. Execute the following commands in sequence:

>> make pycaffe -j

>> make all -j

>> make test -j

Because Caffe has too many dependencies, some dynamic libraries may not be found during the installation process. In this case, you need to add the relative path to LD_LIBRARY_PATH. If it is a header file, you can add the corresponding path to CPLUS_INCLUDE_PATH. For example, if the HDF5 library cannot be found, you can execute:

>> export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/path/to/hdf5/serial

In order to call Caffe directly in python, you also need to add the Python path of Caffe to PYTHONPATH.

>> export PYTHONPATH=$PYTHONPATH:/path/to/caffe/python

 

The basic concept of Caffe: In essence, it is similar to the way Symbolic is used in MXNet. It first defines a calculation relationship, and then combines data training and use models based on this calculation relationship. In Caffe, the most basic computing node is a layer, and all computing relationships are based on layers, whether they are Caffe's predefined layers or user-defined layers. There is also a more common way of using it, which is to use the protobuf format instead of code to define the structure of the network, and then combine it with the data for training. In other words, using Caffe to train the model does not require writing code. This reflects the use of expression and modularity in Caffe's design philosophy.

 

In addition to Caffe's interface documentation, a list of information on various commonly used layers is also listed on the Caffe official website at: http://caffe.berkeleyvision.org/tutorial/layers.html

A network (Net) can be constructed based on various layers , then the data is defined , and a Slover module optimized by the gradient descent method is defined , and a model can be trained.

In Caffe, the form of data is a class called Blob, which is actually a spatially continuous multi-dimensional array. For example, when storing an image, it is a four-dimensional array. The four dimensions are batch size, number of channels, image height, and image width.

 

Step 1: Prepare the data. In Caffe, the support for non-image data is not very good. Here, the HDF5 format is used to prepare the generated coordinate data and corresponding labels. The specific code is as follows, (Note: What the HDF5Data layer in Caffe reads is not The HDF5 data itself is a list of HDF5 files, so in addition to generating data.h5, we also generate a data_h5.txt as the data source of the HDF5Data layer.

import pickle
import numpy as np
import h5py
with open('data.pkl','rb') as f:
    samples, labels = pickle.load(f)
sample_size = len(labels)
samples = np.array(samples).reshape((sample_size, 2))
labels = np.array(labels).reshape((sample_size, 1))
h5_filename = 'data.h5'
with h5py.File(h5_filename, 'w') as h:
    h.create_dataset('data', data=samples)
    h.create_dataset('label', data=labels)
with open('data_h5.txt', 'w') as f:
    f.write(h5_filename)

Step 2: Define the network and train.prototxt of the training network.

name: "SimpleMLP"
layer {
  name: "data
  type: "HDF5Data"
  top: "data"
  top: "label"
  include {
    phase: TRAIN
  }
  hdf5_data_param {
    source: "data_h5.txt"
    batch_size: 41
  }
}

layer {
  name: "fc1"
  type: "InnerProduct"
  bottom: "data"
  top: "fc1"
  inner_product_param {
    num_output: 2
    weight_filler {
      type: "uniform"
    }
  }
}

layer{
  name: "sigmoid1"
  type: "Sigmoid"
  bottom: "fc1"
  top: "sigmoid1"
}

layer {
  name: "fc2"
  type: "InnerProduct"
  bottom: "sigmoid1"
  top: "fc2"
  inner_product_param {
    num_output: 2
    weight_filler {
      type: "uniform"
    }
  }
}

layer{
  name: "loss"
  type: "SoftmaxWithLoss"
  bottom: "fc2"
  bottom: "label"
  top: "loss"
}

The program code is not complicated, and the meaning of each layer is relatively clear. name means the layer, type specifies the type of the layer, bottom is the name of the input blob, and top is the name of the output blob. Finally, depending on the type, each layer will have specific parameters, such as InnerProduct, that is, in the fully connected layer, num_output is the number of outputs, which is the number of hidden units. Overall, this prototxt file first defines the HDF5 data layer, which is used to read data from HDF5 files. The read data is arranged in order as data and label.

Because in this simple example, we use all the data for training, there is only one HDF5 data layer with phase TRAIN, and batch_size is the number of all samples. In this simple example, there are only the fully connected InnerProduct layer and the Sigmoid layer of the activation function. What needs to be mentioned here is the parameter weight_filler. In mxnet, the default is to use uniformly distributed random numbers to initialize network parameters, so there is no special specification. In caffe, the default initialization parameter is actually 0, which makes it difficult to converge in this example. So we specifically manually specify this parameter as uniform, which will generate a random number between 0 and 1 by default to initialize the current layer. In addition to weight_filler, there is also bias_filler to initialize the bias value, but because the default value is not It affects convergence, so it is omitted in the example.

Finally, the SoftmaxWithLoss layer is used to calculate the loss function.

Caffer comes with a script specifically used to visualize the network structure. The path is caffe/python/draw_net.py. Execute:

>> python /path/to/caffee/python/draw_net.py train.prototxt mlp_train.png -randir BT

The network structure visualized by Caffe can be obtained and saved in mlp_train.png.

Step 3: After defining the network structure, you also need to define solver.prototxt for gradient descent:

net: "train.prototxt"
base_lr: 0.15
lr_policy: "fixed"
display: 100
max_iter: 2000
momentum: 0.95
snapshot_prefix: "simple_mlp"
solver_mode: CPU

The first net parameter specifies train.prototxt for training. base_lr is the learning rate, which is set to 0.15 here. lr_policy is a policy that changes the learning rate with training. It is set to fixed here, which means no changes will be made. Display refers to displaying information about training progress every so many iterations, such as the current loss value. max_iter is the maximum number of training times. It will stop when this number is reached, and a backup of the parameters of the current model and the solver state will be generated. As the name suggests, Momentum is the coefficient of momentum, and snapshot_prefix is ​​the name prefix of the saved model and solver files. solver_mode specifies whether to use CPU or GPU for training. In the example, CPU is used first.

Step 4: Execute the following command to start training,

>> /path/to/caffe/build/tools/caffe train -solver solver.prototxt

During the training process, the current iteration number and corresponding loss value will be output.

Or you can use the following python script for training. The training effect is the same as the previous command training effect.

import sys
import numpy as np
sys.path.append('/opt/caffe/python')
import caffe
solver = caffe.SGDSolver('solver.prototxt')
solver.solve()
net = solver.net
net.blobs['data'] = np.array([[0.5, 0.5]])
output = net.forward()
print(output)

the fifth step:

However, it cannot be used immediately after the training in the fourth step. The print on the last line of the python script code in the fourth step will output the following results:

{'loss': array(0.004321129061281681, dtype=float32)}, this is a dictionary, the key is the last layer blob name, and the value is the value of loss.

This is because in the train.prototxt network, we define a network for training. When it comes to the inference stage, we need to make some modifications to this structure. The main parts of the modification are the input and output parts, and at the same time initialize the network weights. Optional, the complete content is as follows:

name: "SimpleMLP"
input: "data"
input_shape {
  dim: 1
  dim: 2
}

layer {
  name: "fc1"
  type: "InnerProduct"
  bottom: "data"
  top: "fc1"
  inner_product_param {
    num_output: 2
  }
}

layer{
  name: "sigmoid1"
  type: "Sigmoid"
  bottom: "fc1"
  top: "sigmoid1"
}

layer {
  name: "fc2"
  type: "InnerProduct"
  bottom: "sigmoid1"
  top: "fc2"
  inner_product_param {
    num_output: 2
  }
}

layer{
  name: "softmax"
  type: "Softmax"
  bottom: "fc2"
  top: "prob"
}

The HDF5Data layer is gone. Instead, input is used to specify the name of the input blob, and input_shape specifies the shape of the input data. The last layer has also become Softmax, and the top-level output is named prob, which represents the probability of belonging to a certain class. Save the network structure used for inference as test.prototxt, and then use the following python code to execute the trained model for inference (Inference), and finally generate a visual result.

import sys
import pickle
import numpy as np
import matplotlib.pyplot as plt
  from mpl_toolkits.mplot3d import Axes3D
  sys.path.append('/opt/caffe/python')
import caffe
net = caffe.Net('test.prototxt', 'simple_mlp_iter_2000.caffemodel', caffe.TEST)
with open('data.pkl', 'rb') as f:
  samples, labels = pickle.load(f)
samples = np.array(samples)
labels = np.array(labels)
X = np.arange(0, 1.05, 0.05)
Y = np.arange(0, 1.05, 0.05)
X, Y = np.meshgrid(X, Y)
grids = np.array([[X[i][j], Y[i][j]] for i in range(X.shape[0])
                                     for j in range(X.shape[1])])
grid_probs = []
for grid in grids:
  net.blobs['data'].data[...] = grid.reshape((1,2)[...]
  output = net.forwad()
  grid_probs.append(output['prob'][0][1])
grid_probs = np.array(grid_probs).reshape(X.shape)
fig = plt.figure('Sample Surface')
ax = fig.gca(projection='3d')
ax.plot_surface(X, Y, grid_probs, alpha=0.15, color='k', rstride=2, cstride=2, lw=0.5)
samples0 = samples[labels==0]
samples0_probs = []
for sample in samples0:
  net.blobs['data'].data[...] = sample.reshape((1, 2))[...]
  output = net.forward()
  samples0_probs.append(output['probs'][0][1])
samples1 = samples[labels==1]
samples1_probs = []
for sample in samples1:
  net.blobs['data'].data[...] = sample.reshape((1, 2))[...]
  output = net.forward()
  samples1_probs.append(output['prob'][0][1])
ax.scatter(samples0[:,0], samples0[:,1], samples0_probs, c='b', marker='^', s=50)
ax.scatter(samples1[:,0], samples1[:,1], samples0_probs, c='r', marker='o', s=50)
plt.show()

 

Guess you like

Origin blog.csdn.net/Fan0920/article/details/107532419