caffe related

The full name of Caffe should be Convolutional Architecture for Fast Feature Embedding. It is a clear and efficient deep learning framework. It is open source. Its core language is C++. It supports command line, Python and Matlab interfaces. It can run on the CPU. It can also run on the GPU. Its license is BSD 2-Clause.

 One reason why Deep Learning is more popular is mainly because it can learn useful features from data autonomously. Especially for occasions where you don't know how to design features, such as images and speech.

 The design of Caffe: Basically, Caffe follows a simple assumption of the neural network-all calculations are expressed in the form of layers. What the layer does is to take some data and output the results of some calculations, such as Speaking of convolution, it is to input an image, then convolve with the parameters of this layer (filter), and then output the result of the convolution. Each layer needs to do two calculations: forward is to calculate the output from the input, and then backward is to calculate the gradient relative to the input from the gradient given above. As long as these two functions are implemented, we can connect many layers into one Network. What this network does is to input our data (image or voice or whatever), and then calculate the output we need (such as the recognized label). During training, we can calculate the loss based on the existing label. And gradient, and then use gradient to update the parameters of the network, this is a basic process of Caffe.

 Basically, the easiest way to get started with Caffe is to first write the data in Caffe format, then design a network, and then use the solver provided by Caffe to optimize the effect. If your data is an image, you can start from the existing For example, start with alexnet or googlenet, and then do fine tuning. If your data is slightly different, such as a direct float vector, you may need to do some custom configuration. Caffe's logistic regression example may be helpful .

 Fine tune method: The idea of ​​fine tuning is to train a great network on a data set as large as imagenet, and it must be good on other tasks, so we can take the pretrain network and just renew it. Train the last few layers, retraining means that, for example, I used to classify a thousand categories of imagenet, now I only want to identify whether it is a dog or a cat, or whether it is a license plate, so I can change the last layer of softmax from a 4096 The *1000 classifier becomes a 4096*2 classifier. This strategy is very useful in applications, so we often pretrain a network on imagenet first, because we know the approximate process of training on imagenet.

 Caffe can be used in vision, speech recognition, robotics, neuroscience and astronomy.

 Caffe provides a complete toolkit for training, testing, fine-tuning and deploying models.

 Highlights of Caffe:

 (1) Modularization: Caffe was designed to be as modular as possible from the beginning, allowing the expansion of new data formats, network layers and loss functions.

 (2) Separation of representation and realization: Caffe's model definition is written into the configuration file in Protocol Buffer language. In the form of any directed acyclic graph, Caffe supports network architecture. Caffe will correctly occupy the memory according to the needs of the network. Through a function call, the switch between CPU and GPU is realized.

 (3) Test coverage: In Caffe, each single module corresponds to a test.

 (4), Python and Matlab interface: Provide Python and Matlab interface at the same time.

 (5) Pre-training reference models: For vision projects, Caffe provides some reference models. These models are only used in academic and non-commercial fields, and their license is not BSD.

 Caffe architecture:

 (1) Data storage: Caffe stores and transmits data in a 4-dimensional array through "blobs". Blobs provides a unified memory interface for batch image (or other data) operation, parameter or parameter update. Models are stored on disk in the form of Google Protocol Buffers. Large data is stored in the LevelDB database.

 (2) Layer: A Caffe layer (Layer) is the essence of a neural network layer, which takes one or more blobs as input and produces one or more blobs as output. For the operation of the network as a whole, the layer has two key responsibilities: forward propagation, which requires input and generates output; backward propagation, takes the gradient as the output, and calculates the gradient through parameters and inputs. Caffe provides a complete set of layer types.

 (3) Network and operation mode: Caffe retains all directed acyclic layer graphs to ensure correct forward and backward propagation. The Caffe model is an end-to-end machine learning system. A typical network starts at the data layer and ends at the loss layer. Through a single switch, the network runs on the CPU or GPU. On the CPU or GPU, the layer will produce the same result.

 (4) Training a network: Caffe trains a model (Model) by fast and standard stochastic gradient descent algorithm.

 In Caffe, fine tuning is a standard method that adapts to existing models, new architectures or data. For new tasks, Caffe fine-tunes the old model weights and initializes the new weights as needed.

 Blobs, Layers, and Nets: The composition mode of a deep network is expressed as a collection of internal connection layers that work with data blocks. With its own model mode, Caffe defines a layer-by-layer network. The Caffe network defines the entire model from the low-end to the top-level, from the input data to the loss layer. As data travels forward and backward through the network, Caffe stores, communicates, and manipulates information as Blobs. Blob is a standard array and unified memory interface framework. Blob is used to store data, parameters and loss. The subsequent layer serves as the basis of model and calculation, and it is the basic unit of the network. Net serves as the connection and collection of the layer and the construction of the network. Blob describes in detail how layer and layer or net carry out information storage and communication. Solver is the solution of Net.

 Blob storage and transmission: A blob is an encapsulation of the actual data to be processed, and it is passed through Caffe. Between the CPU and GPU, blobs also provide synchronization capabilities. Mathematically, a blob is an array of continuous N-dimensional arrays.

 Caffe stores and transmits data through blobs. Blobs provides a unified memory interface to save data, for example, batch images, model parameters, and derivative optimization.

 Blobs hide the overhead of synchronizing computing and hybrid CPU/GPU operations from the host CPU to the device GPU as needed. The memory of the host and device is allocated on demand.

 For batch image data, the normal capacity of a blob is the number of images N*the number of channels K*the image height H*the image width W. In terms of layout, Blob storage is dominated by rows, so the last/rightmost dimension changes the fastest. For example, in a 4D blob, the value physical location index of the index (n, k, h, w) is ((n * K + k) * H + h) * W + w. For non-image applications, blobs are also effective, such as 2D blobs.

 The parameter blob size changes according to the type and configuration of the current layer.

 A blob stores two pieces of memory, data and diff. The former is normal data propagated in the forward direction, and the latter is the gradient calculated through the network.

 A blob uses the SyncedMem class to synchronize the values ​​between the CPU and GPU, in order to hide the detailed information of the synchronization and minimize the data transfer.

 Layer calculation and connection: Layer is the essence of model and the basic unit of calculation. Layer convolution filter, pool, take inner product, apply nonlinearity, sigmoid and other element conversion, normalize, load data, calculate loss.

 Each layer type defines three crucial calculations: setting, forward and reverse. (1) Settings: Initialize this layer and connect once when the model is initialized; (2) Forward: Calculate the output from the bottom for the given input data and send it to the top; (3), Reverse: For the given Gradient, the top output calculates this gradient to the input and transmits it to the low end.

 There are two forward and backward function executions, one for the CPU and one for the GPU.

 The definition of Caffe layer consists of two parts, layer attributes and layer parameters.

 Each layer has input some'bottom' blobs and outputs some'top' blobs.

 Net definition and operation: Net is composed and differentiated together to define a function and its gradient. Each layer outputs a calculation function to complete a given task, and each layer calculates the gradient from the learning task through loss. The Caffe model is a terminal-to-terminal machine learning engine.

 Net is a directed acyclic graph (DAG) composed of layers. A typical net starts at the data layer. This layer loads data from the disk and ends at the loss layer. This layer calculates target tasks such as classification and reconstruction.

Model initialization is processed by Net::Init(). Initialization mainly does two things: build the entire DAG by creating blobs and layers, and call the SetUp() function of layers. It also does a series of other bookkeeping (bookkeeping) things, such as verifying the correctness of the entire network architecture.

 Model格式:The models are defined in plaintext protocol buffer schema(prototxt) while the learned models are serialized as binary protocol buffer(binaryproto) .caffemodel files. The model format is defined by the protobufschema in caffe.proto.

 Forward and Backward:Forward inference, Backward learning.

 Solver optimizes a model by first calling forward to get the output and loss, then calling backward to generate the gradient of the model, and then merging the gradient to the weight update to minimize the division of labor between the loss. Solver, Net and Layer, so that Caffe remains modular And open development.

 Loss: In Caffe, as most machine learning, learning is driven by loss functions (error, cost, or objective functions). A loss function specifies the learning goal by mapping parameter settings (for example, the current network weight) to a scalar value. Therefore, the goal of learning is to find a setting that minimizes the weight of the loss function.

 In Caffe, loss is calculated by the forward of the network. Each layer takes a set of input blobs (bottom, for input), and produces a set of output blobs (top, for output). The output of some layers may be used in the loss function. For classification tasks, a typical loss function choice is the SoftmaxWithLoss function.

 Loss weights: net generates a loss through many layers, and loss weights can be used to specify their relative importance.

 By convention, the Caffe layer type with the "loss" suffix is ​​applied to the loss function, but other layers are assumed to be purely broken for intermediate calculations. However, any layer can be used for loss by adding a "loss_weight" field to a layer definition.

 In Caffe, the final loss is calculated through the addition of all weighted losses and through the network.

 Solver: Solver tries to improve the loss to achieve model optimization by coordinating the forward reasoning of the network and the backward gradient formation parameter update. The responsibility of Learning is to be divided into Solver to supervise and optimize and generate parameter updates, and Net to generate loss and gradient.

 Caffe solver method: Stochastic Gradient Descent (type: "SGD"); AdaDelta (type: "AdaDelta"); Adaptive Gradient (type: "AdaGrad"); Adam (type: "Adam") ;Nesterov's Accelerated Gradient(type:"Nesterov"); RMSprop(type:"RMSProp").

 Solver role: Solver is the solution of Net. (1), optimize bookkeeping, create a learning training network, and evaluate the network; (2) call forward/backward iterative optimization and update parameters; (3), regularly evaluate the test network; ( 4), the entire optimized snapshot model and solver state.

 Each iteration of the Solver is executed: (1), call the network forward to calculate the output and loss; (2) call the network backward to calculate the gradient; (3), according to the solver method, use the gradient to update the parameters; (4), according to the learning rate , History and method to update the solver state. Through the above execution, all weights are obtained from initialization to learned model.

 Like Caffe models, Caffe solvers can also run in CPU or GPU mode.

 The solver method deals with the overall optimization problem of minimizing loss.

 The actual weight update is generated by the solver and then applied to the net parameter.

 Layer Catalogue: In order to create a Caffe model, you need to define the model architecture in a prototxt file (protocol buffer definition file). Caffe layers and their parameters are defined in the protocol buffer definitions file, which is caffe.proto for Caffe projects.

 Vision Layers: Vision layers usually take images as input and produce other images as output:

 (1), Convolution (Convolution): The convolution layer convolves the input image with a series of learnable filters, and in the output image, each produces a feature map; (2), Pooling (Pooling); (3) ), Local Response Normalization (LRN); (4), im2col.

 Loss Layers: Loss-driven learning is minimized by comparing one output to one goal and assigning costs. Loss itself is calculated through forward transmission, and the gradient to loss is calculated through backward transmission:

 (1)、Softmax(SoftmaxWithLoss);(2)、Sum-of-Squares/Euclidean(EuclideanLoss);(3)、Hinge/Margin(HingeLoss);(4)、SigmoidCross-Entropy(SigmoidCrossEntropyLoss);(5)、Infogain(InfogainLoss);(6)、Accuracy andTop-k。

 Activation/NeuronLayers: Generally, Activation/Neuron Layers are operated element by element. Enter a bottom blob to generate a top blob of the same size:

 (1)、ReLU/Rectified-Linearand Leaky-ReLU(ReLU);(2)、Sigmoid(Sigmoid);(3)、TanH/Hyperbolic Tangent(TanH);(4)、Absolute Value(AbsVal);(5)、Power(Power);(6)、BNLL(BNLL)。

 Data Layers: Data is input to Caffe through Data Layers, which are at the low end of the network. The data can come from: an efficient database (LevelDB or LMDB), directly from the memory, or from a file without paying attention to efficiency, HDF5 data format on disk or ordinary image format:

 (1)、Database(Data);(2)、In-Memory(MemoryData);(3)、HDF5Input(HDF5Data);(4)、HDF5 Output(HDF5Output);(5)、Images(ImageData);(6)、Windows(WindowData);(7)、Dummy(DummyData).

 Common Layers:(1)、InnerProduct(InnerProduct);(2)、Splitting(Split);(3)、Flattening(Flatten);(4)、Reshape(Reshape);(5)、Concatenation(Concat);(6)、Slicing(Slice);(7)、Elementwise Operations(Eltwise);(8)、Argmax(ArgMax);(9)、Softmax(Softmax);(10)、Mean-VarianceNormalization(MVN)。

 Data: In Caffe, data is stored in Blobs. Data Layers loads input and saves output through conversion from blobs to other formats. Common conversions like mean-subtraction and feature-scaling are done by configuring the data layer. New input types need to develop a new data layer to support.
 

Guess you like

Origin blog.csdn.net/csdn1126274345/article/details/102964860