Deep Learning Overview

Deep Learning Overview

1. Basic Concepts

1. The neural network is composed of a large number of neurons, and the neurons are divided into layers and connections. A neuron may or may not work (activation function). A large number of neuron combinations and parameters can simulate arbitrarily complex functions. The human brain learns skills and knowledge by learning to stimulate neurons, and artificial neural networks are used in machine learning to simulate the activities of the human brain.

2. Deep learning: Neurons have multiple levels, which can express different levels of abstraction, and can achieve effects such as dimensionality reduction, feature selection, and labeling.

3. Neurons have the characteristics of learning: that is, the weights and batis can be changed according to the environment, so that the algorithm can be adjusted according to the data.

4. Transfer learning, reinforcement learning (autonomous driving, playing games, etc.)

2. Features

1. Instead of thinking about algorithms mathematically, parameters are adjusted through continuous training to achieve the effect of "accumulating experience" as a whole (image thinking).

2. The training results are stored in the architecture and parameters of the network, and in many cases it is impossible to write a real function analytic or generative expression (so it is called ai black box). Whether it is network structure or parameter optimization, it relies more on engineering experience rather than strict mathematical methods

3. Basic assumptions Algorithms + data are better than the models you come up with

3. Activation function

1. Requirements: nonlinearity (linearity embodied by the network); differentiability (based on gradient optimization); monotonicity (guaranteed to be a convex function); f(x)≈x (the initial value can be set randomly); the value range is limited, the algorithm Stable, infinite value range, easy to die, but high computational efficiency (learning rate should be smaller)

2. Perceptron

wx+b> 0 --> 1 else 0 The description ability is at least equal to Boolean algebra (direct simulation of the human brain)

2. S-type neurons (sigmoid) 

1/1+e^-z z = wx+b Values ​​between 0-1 represent likelihood (easy for differentiation)

3. The tanh function tanh(x)=2sigmoid(2x)−1 is better than sigmoid

4. ReLU (Rectified Linear Unit: Rectified Linear Unit) f(z) = max(0, z) (solve the disappearance of gradient across multiple layers)  

5. PReLU (Parametric Rectified Linear Unit) adds a parameter

6. More Maxout parameters

4. Network Model

1. Feedforward network

2. Convolutional Network (CNN): Consider the characteristics of the spatial information of the image. Useful when there are very large datasets, large numbers of features, and complex classification tasks. Such as image recognition, text-to-speech, drug discovery, etc. Each neuron in the first hidden layer is connected to a small area of ​​the input neuron. Full connections become block-to-one (faster training).

Span: Shift a few pixels to the right and down

Premise: local receptive fields

Feature mapping: shared weights, shared biases (convolution kernels, filters) to achieve image translation recognition effect

Hybrid: Simplified output of feature maps max-polling L2-polling

3. RNN (Recurrent neural network) Recurrent neural network, more similar to the brain, has a neural network with time-related behavioral characteristics, when storing a large amount of ordered information. Examples: Image classification and captioning, political sentiment analysis, speech recognition (homonyms and words) and natural language processing (understanding python, sorting, etc., translation)

Long short-term memory units (LSTMs): Simplifies the training of RNNs

4. Deep Belief Networks, Generative Models and Boltzmann Machines (Network + Probability)

Deep Belief Network (DBN)

5. Autoencoder

6. Generative Adversarial Networks (GANs)

https://www.leiphone.com/news/201702/NcdoDmmOn1RgeCIL.html

5. Methods of optimization and parameter adjustment

1. Basic principle: fine-tune the results and see how the parameters change (based on gradient differentiation)

2. Gradient descent algorithm (Batch gradient descent): the ball rolls down the mountain (search iterative method to find the minimum value)

3. Stochastic gradient descent SGD: (do not find the steepest slope) do not use all the data to calculate partial derivatives

4, BP, backpropagation (backpropagation): a method of calculating a large number of partial derivatives

5. Reasons for the disappearance of deep learning gradients: random initialization, sigmoid function, network structure

6. Normalization: Add a penalty term to the loss function to avoid overfitting

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326528473&siteId=291194637