I'm learning Caffe recently, so let's summarize the basic knowledge in Caffe. The most basic concepts in Caffe include blobs, layers and nets. The model in Caffe is based on a defined network, which defines the entire model from bottom to top (ie, from the input data layer to the final loss layer). The data in the model is stored in a data structure called blob, which is a standard array and a unified memory interface for the entire framework.
In the field of computer vision, most data is a four-dimensional (4D) array (but blobs can also be used for non-image applications), and the dimensions of common blobs for image set data are (
A blob stores both data and diff. data is the normal data passed in the past, and diff is the gradient value (difference value) calculated by the network. It may be stored on the CPU or GPU, and there are two ways to get them (const and non-const), the const type can change the data, and the other cannot. The code is as follows (this is the cpu implementation of data, gpu is similar, and diff is also similar):
const Dtype* cpu_data() const;
Dtype* mutable_cpu_data();
These blob data are seamlessly connected between CPU and GPU, which can be explained by referring to the code below.
// Assuming that data are on the CPU initially, and we have a blob.
const Dtype* foo;
Dtype* bar;
foo = blob.gpu_data(); // data copied cpu->gpu.
foo = blob.cpu_data(); // no data copied since both have up-to-date contents.
bar = blob.mutable_gpu_data(); // no data copied.
// ... some operations ...
bar = blob.mutable_gpu_data(); // no data copied when we are still on GPU.
foo = blob.cpu_data(); // data copied gpu->cpu, since the gpu side has modified the data
foo = blob.gpu_data(); // no data copied since both have up-to-date contents
bar = blob.mutable_cpu_data(); // still no data copied.
bar = blob.mutable_gpu_data(); // data copied cpu->gpu.
bar = blob.mutable_cpu_data(); // data copied gpu->cpu.
The flow of data is like this, where the data is changed a total of 3 times (except for the initial CPU version).
Next, let's talk about layers. Each layer has three basic modules: setup (initial setting), forward (forward calculation), and backward (backward propagation, that is, calculating gradient values).
Setup: Initialize layers and connections when the model is initialized;
Forward: Calculate the output given the input of the bottom layer and pass it to the top layer;
Backward: Given the gradient value of the output of the top layer, calculate the gradient value with respect to the input, and pass it to the bottom layer. Layers with parameters compute gradient values with respect to the parameters and store them internally.
For net, it is a computational graph connected by many layers - a directed acyclic graph (DAG). A typical net starts with a data layer that loads data from disk and ends with a loss layer. The network can perform tasks such as classification and reconstruction.
Below is a simple logistic regression classifier. The network is illustrated and defined as follows:
name: "LogReg"
layer {
name: "mnist"
type: "Data"
top: "data"
top: "label"
data_param {
source: "input_leveldb"
batch_size: 64
}
}
layer {
name: "ip"
type: "InnerProduct"
bottom: "data"
top: "ip"
inner_product_param {
num_output: 2
}
}
layer {
name: "loss"
type: "SoftmaxWithLoss"
bottom: "ip"
bottom: "label"
top: "loss"
}