Article Directory
Class Q & A
Foreword
Now the deep learning framework is developing too fast (some of the code in the article has changed a little bit), this article mainly records the lectures about Pytorch, Caffe and TensorFlow. The point is Pytorch.
In the previous theory class, we know that to complete the training of the neural network, it is best to draw the calculation graph of the network, and then clarify the forward and backward propagation processes. At the same time, we must consider the optimization of the underlying implementation, such as parallel loading of data, parallel training of the network, and parallel computing of the GPU. The neural network framework to make this work becomes a lot simpler:
while a basic framework to achieve that goal now: to spread like Numpy the same, and the ability to write by the former calculation chart automatically calculated gradient back-propagation. For example, the following example:
1. TensorFlow (static graph)
The main process is divided into the following parts:
- Define the calculation graph for forward propagation (no calculation at this time);
- Run the example of the calculation graph to obtain the loss and gradient;
- Use the gradient to update and repeat the process.
Then add the training code:
However, there is a problem with the above operation, that is, we need to transfer the training data and weights from the CPU to the GPU each time. After one iteration, we also need to pass the gradient and loss back to update the parameters, and the weights and data are both In large cases, the above operations are very time-consuming.
Therefore, we can add the weight as an internal variable of the calculation graph , which exists throughout the calculation period, and does not need to be passed around, while adding parameter update operations into the calculation graph.
However, the above operation still has problems: that is, our loss is calculated before the weight is updated, and we only return the loss in the end. At this time, TensorFlow will only intelligently execute to the position we need, and the weight will not be updated.
One solution is to explicitly return the new weights , but these weights are often very large tensors (tensor), which will cause big data to move between the CPU and GPU. A little trick is: we add some dummy nodes in the calculation graph, these nodes operate on our weights, but will not return a big data , and finally as long as we calculate the loss and this fake node at the same time Just click.
Of course, we can also call TensorFlow inside optimizer Optimizer to help us to automatically perform the above operation.
Of course, TensorFlow also defines many advanced APIs that encapsulate many network layers and loss functions. And there is a high-level API- Keras , whose backend is based on TensorFlow and encapsulates many operations.
There is a tool in visual TF called TensorBoard that supports us to visualize the training process and network structure.
2. Pytorch (dynamic calculation chart)
Pytorch differs from TensorFlow in that it uses dynamic computation graphs. And there are three layers of abstraction defined internally: it is
called a dynamic graph because it does not need to display a defined calculation graph like TF first, and then input training data for training. Instead, a calculation graph is dynamically created when the calculation is performed again.
For example, we use Pytorch to define a two-layer neural network for training:
automatic derivation (Autograd)
where:
- A PyTorch Variable is a node in a computational graph
- x.data is a Tensor
- x.grad is a Variable of gradients (same shape as x.data)
- x.grad.data is a Tensor of gradients
Of course, we can also define our own back-propagation function, which is equivalent to calculating a Gate in the graph , only need to complete the forward and back-propagation. Then PyTorch can automatically derive.
Advanced packaging nn is
similar to Keras's packaging of TF. There is also an advanced API in Pytorch, namely the nn module, which defines many of our commonly used network layers and functions.
The Optimizer
defines its own network layer.
Of course, we can also nn.Module
define our own network structure through inheritance , making it like the API provided by nn.
Data loader (DataLoader)
Of course, Pytorch also has a relatively easy-to-use interface-data loader. It allows us to load data in parallel, as long as we first define our own data class (DataSet).
Pre-training model
in torchvision
contains a common network architecture and a lot of pre-training model in computer vision.
Visualization
Similar to TF's TensorBoard, Pytorch also has a visualization tool Vidom , but it also supports the visualization of calculation graphs.
3. Dynamic graph and static graph
Earlier I said that TF is based on static graphs, and it needs to display and declare our calculation graphs first; while Pytorch is based on static graphs, it constructs graphs while calculating.
- In comparison, static graphs can do some optimization because they know our graph structure in advance:
- Another advantage is that once a static graph is defined, it rarely changes, so we can serialize it and store it on disk.
After that, you can directly load and reuse, without having to look at the previous code. (That is, when it is released, we only need to calculate the graph, no training code) - The dynamic graph looks simple in many scenarios because there are no tedious steps to define the calculation graph. And Pytorch is more Pythonic. At the same time, the control flow in the network is also like writing Python and Numpy. However, if you want to add control flow to TF, you need to use some of TF's APIs to add it to our static graph.
4. Caffe / Caffe2
Unlike the previous framework, Caffe bottom layer is written in C ++, which is better in product deployment.
And rarely write code, all edit configuration files: