PaddlePaddle detailed comparison and analysis TensorFlow

This article from the framework overview, system architecture, programming model, distributed architecture, the framework of these five aspects of contrast and compare TensorFlow PaddlePaddle framework. As the depth learning framework developed by the two major international search engines, using different emphases, but also provides a simple and elegant design architecture, and still evolving.

 

For PaddlePaddle, its ease of use and local, rapid business integration, weapons at a rate of success on the domestic Internet companies is a very favorable; and TensorFlow flexibility and relative partial research, and is the field of AI research a big plus.

 

 

Framework Overview

 

 

PaddlePaddle research and development began in 2013, with Baidu advertising, the rapid growth of text, image, voice and other training data, and algorithms require Baidu takeaway, search, unmanned areas, Baidu deep learning laboratory in single GPU based training platform based on the research and development of the Paddle (parallel Asynchronous Distributed Deep Learning) this multi-GPU parallel machine this training platform.

 

PaddlePaddle Since open source, its design and location have been concentrated in the " easy to use, efficient, flexible, scalable on." As design orientation official website: An Easy-to-use, Efficient, Flexible and Scalable Deep Learning Platform.

 

PaddlePaddle open source framework began in September 2016, its greatest feature is the positioning and easy to use, so the many algorithms for a complete package, not only for only the current existing CV, NLP algorithms (such as VGG, ResNet, LSTM, GRU, etc.), which in the model library models (https://github.com/PaddlePaddle/models) module, encapsulates the word vectors (including training Hsigmoid acceleration vector and the noise word comparison word vector estimation acceleration training), language RNN model, hits forecast, text classification, sorting learning (one of the core issues of research and information retrieval search engine), structured semantic model, named entity recognition, sequence to sequence learning, reading comprehension, question answering, image classification, target General solutions detection, scene character recognition, voice recognition and other artificial intelligence techniques.

 

Each of the above solutions are designed for a technical scenario, therefore, the developer may only need a little understanding of the source code under the principle of command execution operation according to the official website of example, replaced its own data, modify some parameters can be super up and running. And because there is no exposed interface to a user excessive python, understood and relatively easy to use. But because the focus in use in research or new functions, if you modify the algorithm needs to start from C ++ to achieve the underlying framework.

 

It's the second major feature is a distributed deployment, and is currently the only well supported Kubernetes depth learning library. It will be further described in the "distributed architecture" in this article.

 

TensorFlow official online description of the TensorFlow is An open-source software library for Machine Intelligence, an open source machine learning library.

 

 

From the number of users and activity is, TensorFlow is the most popular artificial intelligence algorithm engine. It provides the basic elements of the depth of learning, e.g. conv, pooling, lstm and other basic operators. TensorFlow below is used to support the operator:

 

April 2016, TensorFlow 0.8 version to support a distributed, multi-GPU support operations. June 2016, TensorFlow 0.9 version improves support for mobile devices.

 

February 2017, the official version 1.0 of TensorFlow, increasing the Go Java and experimental API, and a dedicated compiler and debugging tools XLA Debugger, also released tf.transform, devoted to data preprocessing.

 

Meanwhile, around the launch of the model deployment TensorFlow Serving, for the algorithm dynamically deployed to the line; and the gradual improvement of similar scikit-learn function tf.contrib.learn. And also launched the "dynamic graphic computing" TensorFlow Fold, which is evaluated as "the first clear leader in the design concept."

 

Users can also use Google's PaaS TensorFlow products Cloud Machine Learning to do distributed training. Now we have full TensorFlow Model Zoo.

 

January 2018, TensorFlow has supported the version 1.5.0, a complete open end TensorFlow Lite mobile application and dynamic view of the mechanism Eager Execution, making operational errors and debugging Python tools and immediate integration.

 

One of the highlights TensorFlow is to support heterogeneous distributed computing devices

 

What is heterogeneous? IT refers to a heterogeneous among the different components have heterogeneous networks (e.g. Internet, different manufacturers of hardware and software products consisting of a unified network communicate with each other), heterogeneous database (a collection of a plurality of database systems, the data may be achieved share and transparent access).

 

Herein refers to heterogeneous devices using the core CPU, GPU, etc. efficiently collaborate; compared to rely solely on the CPU, a higher performance, lower power consumption.

 

What is it distributed? Distributed architecture is designed to help our scheduling and allocation of computing resources (even fault-tolerant, such as a compute node is down or too slow), making tens of millions, hundreds of millions amounts of data model can effectively utilize the machine resource training.

 

Overall, we currently have several major areas of AI framework, and today these two common framework of our comparison is that there are some of the following features. We refer to "The Unreasonable Effectiveness of Recurrent Neural Networks", the article reviews the framework should have a valid function.

 

  • Tensor library is a CPU / GPU transparent, and to achieve a number of operations (e.g. slice, array or matrix operations, etc.). Transparent here means, how to run on different devices, are the framework to help users to achieve, the user only needs to specify what kind of operation which can be carried out on the device.

  • There is a totally separate code libraries, a scripting language (ideally Python) operated Tensors, and the contents of all depths to achieve learning / backpropagation propagation including front graphical computing.

  • Can easily share pre-trained models (such as the model and TensorFlow Caffe in slim modules, PaddlePaddle of models module).

  • No compilation process. Deep learning is toward larger, more complex network development, so the time spent in a complex graph algorithm will be multiplied. Moreover, compiling, then lost the ability to interpretability and effectively debug log.

 

 

So PaddlePaddle What are the advantages for developers?

 

First, there is ease of use. Compared partial underlying Google TensorFlow, accessibility features PaddlePaddle is clear: it allows developers to focus on building high-rise development section depth learning model.

 

In addition, PaddlePadddle domestic giants Baidu open source framework, not only his local nature very much in line with people's habits, but also attaches great importance to mainstream Internet technology usage scenarios and solutions.

 

Secondly, faster. As mentioned above, PaddlePaddle code and design more compact, use it to develop model apparently can save some time for developers. This makes PaddlePaddle well suited for industrial applications, in particular the need to rapidly develop scenario.

We know it directly determines the system architecture fundamentally different framework design, and now we spy something together.

 

 

system structure

 

 

The following figure shows the system architecture TensorFlow, the self apparatus into the bottom layer to the upper layer and the network, the data layer operation, FIG layer calculation, the API layer, application layer, wherein the device layer and the network layer, the data layer operation, FIG calculated layer TensorFlow is the core layer.

 

Here's the bottom-up detail about TensorFlow system architecture. It is the lowermost layer and a network communication management apparatus. Network communication layer comprises gRPC (google Remote Procedure Call Protocol), and remote direct data access (Remote Direct Memory Access, RDMA), which are required when used in distributed computing. Device management including TensorFlow are implemented on the device CPU, GPU, FPGA, etc., that is, the upper layer provides a unified interface, so that only the upper convolution processing logic, etc., without the need to care about the hardware of the convolution Implementation process.

 

Layer on which is a data operation, including operating the convolution function, the activation function and the like. Beyond it is a calculation layer, but also we have to understand the core contains an implementation of local computing and distributed computing chart graph (including creating graphs, compilation, optimization and execution). Beyond the API layer and application layer.

 

Probably because of historical reasons, PaddlePaddle overall architecture ideas and caffe somewhat similar, is based on the functional layer neural networks to develop, and a layer including a number of complex operations, the following figure shows currently include FCN, CTC, BN, LSTM a variety of functional layers.

 

And it reads data (the DataProvider), the functional layer (Layers), optimization mode (Optimizer), evaluation (Evaluators,), activation function (Activation), pooled (Pooling) distribution process implemented these classes, the build PaddlePaddle process neural network is a combination of these layers constitute the entire network.

 

As shown below:

 

Meanwhile, in addition to the accumulation layer by layer, in order to improve flexibility, a good package is additionally based networks, is provided by a combination of different input mixed_layer,

 

As shown below. There encapsulates some combinations may be required, such as conv + batchnorm + pooling, it can greatly simplify the way to build neural networks. Less code, building a network based on sophisticated algorithms Meanwhile, the input data can be modified to run smoothly.

 

 

 

Programming Model

 

Paddle current integral module as shown below:

 

 

  • Math module is Paddle mathematical calculations module. Wherein the various matrices and vectors of realization, the matrix and vector base class BaseMatrix. This module body is divided into two parts.

  • MemoryHandle: applications, memory management and memory modules. Paddle all calculations are in operation on the MemoryHandle. (Abstract MemoryHandle on the nature of the section of memory including address and size of this memory. This memory may be a memory in the general application, the application may be on GPU device.)

  • Matrix / Vector: Paddle type of computing logic implemented. It is a view MemoryHandle in nature.

  • Matrix and vector composed of input and output parameters Parameter Arguments and Neural Network layer neural network.

 

Parameter Arguments and represent all the data and parameters of the neural network. Wherein the connection parameters Parameter represents the middle between the neural network layer and Argument indicates the input and output of each layer. FIG i.e. Parameter indicates the yellow connection, while FIG Argument indicates the input and output (Input, Output).

 

Parameter Argument and not only with the saved parameter values, also while preserving gradient, momentum and other information.

 

  • Use Layer Parameter Argument and complete the calculation.

 

PaddlePaddle overall neural network is a frame-based Layer configuration. In order to support more granular neural network configuration, support and configuration op projection, Paddle provides MixedLayer.

 

And other Layer MixedLayer different, not directly enter the type used other Layer itself, but the other projection Layer or operation. The results of the projection operation and other Layer sequentially added to the output phase of MixedLayer.

 

  • GradientMachine the neural network is a type of each layer are combined with the call. This is the base class, the neural network having a common forward, backward function, and the processing of single and multi-threaded multi-card function.

 

GradientMachine PaddlePaddle is an abstraction for the neural network, i.e., the data type can be calculated Gradient, turn into Parameter calculation result can be. A GradientMachine typically used to calculate a neural network topology.

 

Further, according to the shape topology, GradientMachine create some parameters_, and forward according args and local input parameter, the feedforward calculation of neural networks, and backward function according to the result before the previous feed calculates the gradient of each parameter, and the gradient of the individual parameters stored in the parameters_.

 

  • Trainer call GradientMachine gradient calculated parameters.

  • ParameterUpdaterParameterUpdater after gradientMachine mainly used to calculate the gradient by forward backward, calling the update algorithm update parameters.

  • Trainer topology optimization is terminated config_parser.py Python program generates.

 

TensorFlow do calculations using data flow diagram, we first create a data flow diagram (also called a network structure), as shown in FIG look at each element in the data flow graph.

 

Figure describes the operating principle of TensorFlow. Figure includes an input (input), shaping (reshape), Relu layer (Relu layer), Logit layer (Logit layer), Softmax, cross-entropy (cross entropy), a gradient (gradient), SGD training (SGD Trainer) like portion , it is a simple regression model.

 

Its calculation process, starting with the input, after shaping, layer by layer pre propagation operation. Relu layer (hidden layer) there will be two parameters, namely BH1 and Wh1, using ReLu (Rectified Linear Units) do activation function nonlinear processing before being output. Then enters Logit layer (output layer), and the learning Wsm two parameters bsm. Probability Softmax calculated output distribution of each category. Similarity with the cross entropy measure (probability distribution of the sample source and output distribution) between two probability distributions.

 

And then calculate the gradient start, there is the need parameter Wh1, bh1, Wsm and BSM, and the result of cross-entropy. Then enter SGD training, which is the back-propagation process of calculating the parameters of each layer from top to bottom, in order to be updated. In other words, the order is calculated and updated for bsm, Wsm, bh1 and Wh1.

 

As the name suggests, TensorFlow means "tensor flow." TensorFlow is a data flow diagram node (node) and the edge (Edge) composed directed acyclic graph (directed acycline graph, DAG).

 

TensorFlow Flow consists of two parts and consisting of Tensor, Tensor (tensor) represents the data flow graph edge, and Flow (flow) This action represents the data flow of operations made by the node in FIG.

 

So, in distributed computing, two frameworks and what their characteristics to achieve it?

 

 

Distributed Architecture

 

 

PaddlePaddle distributed architecture has two major components, trainer and parameter server. Distributed training architecture as shown below:

 

Slice data (Data shard): training data for the neural network, and is cut into a plurality of portions, each portion separately to each trainer to use.

 

Computing node (Trainer): After the start of each trainer to read data segmentation good part, "feed-forward" and "rear feeding" Start neural network calculation, and the server and the communication parameters. Upon completion of training a certain amount of data uploaded calculated gradient (gradients), and then download the optimization of neural network parameters (parameters) after the update.

 

Parameter Server (Parameter server): Each parameter server only save part of all the parameters of the entire neural network. Server receives the uploaded parameters from the computing node gradients and complete updating parameter optimization, sent to each node is calculated at the parameter update again.

 

When training the neural network using synchronous SGD, PaddlePaddle synchronous barrier (barrier), and submit the updated parameters to make the gradient is performed in a sequential manner. In the asynchronous SGD, the trainer does not wait for all submitted gradient parameter is updated, this greatly improves the computation parallelism: no interdependence between the parameters of the server, receiving in parallel gradient and updating the parameter, the parameter server will not wait after all compute nodes begin the next step gradient submitted, nor interdependence between computing nodes, in parallel training model. As it can be seen, although the asynchronous parameter update SGD embodiment increases the degree of parallelism, but does not guarantee synchronous update parameters, stored in a table at any time to update the parameters of server parameters may be compared to another, compared with the synchronous SGD, gradient there will be noise.

 

Meanwhile, Paddle itself supports a variety of ways to deploy and run distributed cluster, including cluster fabric, openmpi cluster, Kubernetes single, Kubernetes distributed distributed and so on.

 

TensorFlow distributed architecture consists of a client (client) and server (server) composition, the server also includes a master node (master) and the working node (worker) both components. We need to focus on the relationship between these three clients, the master node and worker nodes and their interaction.

 

The relationship of client, the master node and the work of

 

Simply speaking, in TensorFlow client to contact the master node through the session, the actual work performed by the worker nodes to achieve. Each node occupies a working device (a hardware abstraction TensorFlow specific calculation, i.e., CPU or GPU). In standalone mode, the client, the master node and the nodes are working on the same server; in distributed mode, they may be located on different servers. The following figure shows the relationship between the three.

 

 

1. Client

 

The client for establishing TensorFlow FIG calculation, and establishes a session layer for interacting with the cluster. Thus, as long as the code contains the Session () is a client. A plurality of clients can be simultaneously connected to the server, and a server may be connected to a plurality of clients

 

2. Server

 

The server is running a process tf.train.Server instance, it is part of the mission TensorFlow cluster (cluster), and the main service node (Master service, also known as the master node) node and the work of service (Worker service, also called worker nodes) of the points. And several working operation consists of a master node processes process node, the master node through the interface between the communication node processes and working processes. Single multi-Cards and are distributed such a configuration, it is only necessary to change the interface for communication between the single multi-Cards and their distributed switching can be achieved.

 

3. The master node service

 

The master node service implements tensorflow :: Session interface to remote nodes connected via RPC service work program, to communicate with the service process nodes work tasks. In TensorFlow server, it is common for the job task_index 0 (job).

 

4. Work Node Service

 

Worker_service.proto working node service implements the interface, using a local portion of the device of FIG calculated. In TensorFlow service ends, the working nodes contain all work service logic node. Each node is responsible for managing the work of one or more devices. Work node can be a different process different local ports, or multiple processes on multiple servers.

The following figure on the left is a stand-alone interactive multi-card, the right is distributed interaction.

 

 

 

Comparison framework

 

The following is a comparison PaddlePaddle TensorFlow and popularity on the frame codes and stability. As can be seen, on the two frame activity, stability is excellent, and is not on par points code quality.

Here are a few of the highlights of my PaddlePaddle the following Paddle of places:

 

  • Easier to use the API, better packaging, more rapid business integration;

  • Small memory footprint, fast, because Paddle inside Baidu also served many big complicated by big data scene, industry experience is very rich;

  • Localization support, deep learning framework is the only official Chinese documents;

  • There are a lot of natural language processing in the ready-made applications, such as emotional classification, neural machine translation, reading comprehension, question answering, etc., relatively simple to use;

  • PaddlePaddle support multi-machine card training, and support a variety of cluster approach itself.

 

to sum up

This article from the framework overview of several aspects of the system architecture, programming model, distributed architecture, framework and contrast illustrates PaddlePaddle and TensorFlow. All in all, "easy to use, flexible and efficient" is the biggest highlight PaddlePaddle, ideal for the traditional Internet to integrate AI module is also clear study design has also been assisted; In addition to this before, distributed architecture supports multiple cluster environment, and can be more easily distributed architecture business combination. I believe that with the release of a number of useful model, we will let you find the program easy to use algorithm in the business.

Published 363 original articles · won praise 74 · views 190 000 +

Guess you like

Origin blog.csdn.net/sinat_26811377/article/details/104560230