Python-Deep Learning-Introduction to Keras

Introduction to Keras

Keras is a Python deep learning framework that can easily define and train almost all types of deep learning models. Keras was originally developed for researchers with the purpose of rapid experimentation.

Keras has the following important features.

(1) The same code can be seamlessly switched and run on the CPU or GPU.
(2) It has a user-friendly API to facilitate the rapid development of prototypes of deep learning models.
(3) Built-in support for convolutional networks (for computer vision), recurrent networks (for sequence processing) and any combination of the two.
(4) Support any network architecture: multi-input or multi-output model, layer sharing, model sharing, etc. This means that Keras can build any deep learning model, whether it is a generative adversarial network or a neural Turing machine.

Keras is released under the relaxed MIT license, which means it can be used for free in commercial projects. It is compatible with all versions of Python (as of mid-2017, from Python 2.7 to Python 3.6 are compatible).
Keras has more than 200,000 users, including academic researchers and engineers from startups and large companies, as well as graduate students and amateurs. Google, Netflix, Uber, CERN, Yelp, Square, and hundreds of startups are using Keras to solve a variety of problems.
Keras is also a popular framework on the machine learning competition website Kaggle. In the latest deep learning competition, almost all the winners use the Keras model, as shown in Figure 1.

Keras popularity

Figure 1 The trend of Google web search popularity for different deep learning frameworks

、 Hard 、 TensorFlow 、 Theano 和 CNTK

Keras is a model-level library that provides high-level building blocks for developing deep learning models. It does not handle low-level operations such as tensor operations and differentiation. Instead, it relies on a specialized, highly optimized tensor library to complete these operations, which is the backend engine of Keras. Instead of choosing a single tensor library and binding the Keras implementation to this library, Keras handles this problem in a modular way (see Figure 2). Therefore, several different back-end engines can be seamlessly embedded in Keras. Currently, Keras has three back-end implementations: TensorFlow back-end, Theano back-end and Microsoft cognitive toolkit (CNTK, Microsoft cognitive toolkit) back-end. In the future, Keras may be expanded to support more deep learning engines.

Figure 2 Software stack and hardware stack of deep learning

TensorFlow, CNTK and Theano are the main platforms for deep learning today. Theano was developed by the MILA laboratory of the University of Montreal, TensorFlow was developed by Google, and CNTK was developed by Microsoft. Every piece of code you write in Keras can run on these three backends without any modification. In other words, you can seamlessly switch between the two backends during the development process, which is often useful. For example, for certain tasks, a certain backend is faster, so we can seamlessly switch over. We recommend using the TensorFlow backend as the default backend for most deep learning tasks, because it is the most widely used, scalable, and can be used in production environments. Through TensorFlow (or Theano, CNTK), Keras can run seamlessly on the CPU and GPU. When running on the CPU, TensorFlow itself encapsulates a low-level tensor arithmetic library called Eigen; when running on the GPU, TensorFlow encapsulates a highly optimized deep learning arithmetic library called NVIDIA CUDA Deep Neural Network Library ( cuDNN).

2. Development with Keras: Overview

The typical Keras workflow is:

(1) Define training data: input tensor and target tensor.
(2) Define the network (or model) composed of layers, and map the input to the target.
(3) Configure the learning process: select the loss function, optimizer, and indicators that need to be monitored.
(4) Call the fit method of the model to iterate on the training data.

There are two ways to define a model: one is to use the Sequential class (only for linear stacking of layers, which is currently the most common network architecture), and the other is to use functional API (functional API, which is used for layer composition directed An acyclic graph allows you to build any form of architecture).
The following is a two-layer model defined by the Sequential class (note that we passed the expected shape of the input data to the first layer).

from keras import models
from keras import layers
model = models.Sequential()
model.add(layers.Dense(32, activation='relu', input_shape=(784,)))
model.add(layers.Dense(10, activation='softmax'))

Below is the same model defined with the functional API.

input_tensor = layers.Input(shape=(784,))
x = layers.Dense(32, activation='relu')(input_tensor)
output_tensor = layers.Dense(10, activation='softmax')(x)
model = models.Model(inputs=input_tensor, outputs=output_tensor)

Using the functional API, you can manipulate the data tensor processed by the model and apply layers to this tensor as if these layers are functions. Once the model architecture is defined, it does not matter whether the Sequential model or the functional API is used. The next steps are the same. The configuration learning process is in the compilation step. You need to specify the optimizer and loss function used by the model, as well as the indicators you want to monitor during the training process. The following is an example of a single loss function, which is currently the most common.

from keras import optimizers
model.compile(optimizer=optimizers.RMSprop(lr=0.001),
　　　　　　　 loss='mse',
　　　　　　　 metrics=['accuracy'])

Finally, the learning process is to pass the Numpy array of input data (and the corresponding target data) into the model through the fit() method, which is similar to Scikit-Learn and other machine learning libraries.

model.fit(input_tensor, target_tensor, batch_size=128, epochs=10)

After mastering the usage of Keras, the questions we need to understand are: Which type of network architecture is suitable for which type of problem? How to choose the correct learning configuration? How to adjust the model to give the results you want?

—— Excerpted from "Python Deep Learning"