[Lecture 8] Deep Learning Software (Deep Learning Software)

Class Q & A

Foreword

Now the deep learning framework is developing too fast (some of the code in the article has changed a little bit), this article mainly records the lectures about Pytorch, Caffe and TensorFlow. The point is Pytorch.

In the previous theory class, we know that to complete the training of the neural network, it is best to draw the calculation graph of the network, and then clarify the forward and backward propagation processes. At the same time, we must consider the optimization of the underlying implementation, such as parallel loading of data, parallel training of the network, and parallel computing of the GPU. The neural network framework to make this work becomes a lot simpler:
Insert picture description here
while a basic framework to achieve that goal now: to spread like Numpy the same, and the ability to write by the former calculation chart automatically calculated gradient back-propagation. For example, the following example:
Insert picture description here
Insert picture description here

1. TensorFlow (static graph)

The main process is divided into the following parts:

  • Define the calculation graph for forward propagation (no calculation at this time);
  • Run the example of the calculation graph to obtain the loss and gradient;
  • Use the gradient to update and repeat the process.

Insert picture description here
Then add the training code:
Insert picture description here
However, there is a problem with the above operation, that is, we need to transfer the training data and weights from the CPU to the GPU each time. After one iteration, we also need to pass the gradient and loss back to update the parameters, and the weights and data are both In large cases, the above operations are very time-consuming.
Therefore, we can add the weight as an internal variable of the calculation graph , which exists throughout the calculation period, and does not need to be passed around, while adding parameter update operations into the calculation graph.
Insert picture description here
However, the above operation still has problems: that is, our loss is calculated before the weight is updated, and we only return the loss in the end. At this time, TensorFlow will only intelligently execute to the position we need, and the weight will not be updated.
One solution is to explicitly return the new weights , but these weights are often very large tensors (tensor), which will cause big data to move between the CPU and GPU. A little trick is: we add some dummy nodes in the calculation graph, these nodes operate on our weights, but will not return a big data , and finally as long as we calculate the loss and this fake node at the same time Just click.
Insert picture description here
Of course, we can also call TensorFlow inside optimizer Optimizer to help us to automatically perform the above operation.
Insert picture description here
Of course, TensorFlow also defines many advanced APIs that encapsulate many network layers and loss functions. And there is a high-level API- Keras , whose backend is based on TensorFlow and encapsulates many operations.


There is a tool in visual TF called TensorBoard that supports us to visualize the training process and network structure.

2. Pytorch (dynamic calculation chart)

Pytorch differs from TensorFlow in that it uses dynamic computation graphs. And there are three layers of abstraction defined internally: it is
Insert picture description here
called a dynamic graph because it does not need to display a defined calculation graph like TF first, and then input training data for training. Instead, a calculation graph is dynamically created when the calculation is performed again.
For example, we use Pytorch to define a two-layer neural network for training:
Insert picture description here
automatic derivation (Autograd)
Insert picture description here
where:

  • A PyTorch Variable is a node in a computational graph
  • x.data is a Tensor
  • x.grad is a Variable of gradients (same shape as x.data)
  • x.grad.data is a Tensor of gradients

Of course, we can also define our own back-propagation function, which is equivalent to calculating a Gate in the graph , only need to complete the forward and back-propagation. Then PyTorch can automatically derive.
Insert picture description here
Advanced packaging nn is
similar to Keras's packaging of TF. There is also an advanced API in Pytorch, namely the nn module, which defines many of our commonly used network layers and functions.
Insert picture description here
The Optimizer
Insert picture description here
defines its own network layer.
Of course, we can also nn.Moduledefine our own network structure through inheritance , making it like the API provided by nn.
Insert picture description here
Data loader (DataLoader)

Of course, Pytorch also has a relatively easy-to-use interface-data loader. It allows us to load data in parallel, as long as we first define our own data class (DataSet).

Pre-training model
in torchvisioncontains a common network architecture and a lot of pre-training model in computer vision.

Visualization

Similar to TF's TensorBoard, Pytorch also has a visualization tool Vidom , but it also supports the visualization of calculation graphs.

3. Dynamic graph and static graph

Earlier I said that TF is based on static graphs, and it needs to display and declare our calculation graphs first; while Pytorch is based on static graphs, it constructs graphs while calculating.

  • In comparison, static graphs can do some optimization because they know our graph structure in advance:
    Insert picture description here
  • Another advantage is that once a static graph is defined, it rarely changes, so we can serialize it and store it on disk.
    After that, you can directly load and reuse, without having to look at the previous code. (That is, when it is released, we only need to calculate the graph, no training code)
  • The dynamic graph looks simple in many scenarios because there are no tedious steps to define the calculation graph. And Pytorch is more Pythonic. At the same time, the control flow in the network is also like writing Python and Numpy. However, if you want to add control flow to TF, you need to use some of TF's APIs to add it to our static graph.
    Insert picture description here

4. Caffe / Caffe2

Unlike the previous framework, Caffe bottom layer is written in C ++, which is better in product deployment.
Insert picture description here
And rarely write code, all edit configuration files:
Insert picture description here

Suggestions and summary

Insert picture description here

Published 19 original articles · Like1 · Visits 493

Guess you like

Origin blog.csdn.net/qq_41341454/article/details/105627915