NLP paper Intensive (a) - Deep Learning


This paper is the field of artificial intelligence, the three cattle were published in the Nature article in 2015, set off a wave of artificial intelligence, we entered the era of full name DL, but these three authors also won the 2019 Turing Award.
The content of this article is in accordance with the wording of the paper, picking the most important part of the record accordingly.

Foreword

Deep Learning allows multi-tier computing model to learn and fit high-level abstract representation, while completing complex calculations and expression.

For decades, the build system of pattern recognition and machine learning, requires a lot of expertise and a very good engine performance is achieved through handle a large number of feature projects. The depth of learning that you do not need to do the work of feature extractor.

Depth learning is a way to learn representation.
The specific method is simple nonlinear multilayer module capable of expressing a combination of a multilayer high-level expression and abstract problem.
If enough conversions combination, any complex function can be expressed and learning.

Deep learning good at finding complex structures in high-dimensional data, are widely used in the field of science, business, government and so on.
Depth study has been successful in many areas.

In the field of NLP, subject classification, sentiment analysis, and automated teller machine translation has made considerable development.

Supervised learning

Supervised learning is learning labels, have labeled training data to learn, and then uses the learned model to predict the other samples.

The objective function can be viewed as weights varied terrain in the high-dimensional space. Negative gradient vector direction "down" the fastest, to a point closer to the minimum error.

SGD stochastic gradient descent, known as random because of the small sample set will be noise estimate for average gradient of all samples, this simple process to find a good set of weights.

After training, the performance of the test system will use the case of different sample data (test set). For the test model generalization ability - the ability to identify for untrained samples.

The traditional method requires manual design better feature extractor, which requires a considerable amount of engineering features and areas of expertise.

Back-propagation algorithm

The nature of back propagation algorithm is derivation of the chain rule.
Prior to the calculation results, the final layers of the calculated result.
The result of calculation errors, derivation (gradient) to each layer backpropagation update parameters.
The cycle of the e-learning success, won the final model.

The core idea of ​​the back-propagation algorithm: calculated objective function may output the derivative of the layers (or the lower level input) to the number (or gradient) of a conductive layer is input through the backward propagation.

Commonly used non-linear change is now ReLU function, it will make a NN learning faster.

Convolution neural network

CNN advantages:

  1. Reduce the number of parameters
  2. Improve efficiency
  3. Reduce the computational complexity

Data showing a variety of forms:
1D: a signal sequence represented by - Language
2D: shows an image or a sound
3D: shows a video image or sound

Convolution neural network of four key ideas:

  1. Local Links
  2. Share weights
  3. Ikeka
  4. Multi-network layer

In a network, a feature of all cells use the same as a filter, wherein FIG different layers use different filters. 2:00 reasons:

  1. An array of data (such as images) a value close to value are highly correlated.
  2. Local statistical characteristics of the different positions less relevant

Pooling effect layer is semantically similar to the features combined, This is because the characteristics of a subject, which is not the same relative position.
When one of the first data position there is a change, so that these operations pool features is more robust to these changes.

Convolutional neural network using the depth image appreciated

Depth convolution neural network is used on millions of images data set, the data set contains 1,000 different categories. GPU effective use of computing resources, ReLU activation function and dropout regularization techniques. Achieved very good results in the 2012 contest ImageNet

Distributed representation and language processing

The distributed nature of the two advantages:

  1. Generalization to adapt to new compositions can be learned feature values.
  2. Deep Web presentation layer combinations adds another level of index potential.

Level semantic features related word language model nerve, similar word semantic space, location close to each other.

Recurrent Neural Networks

RNN Expand Think of it as a layer all share the same depth feedforward neural network weights.
Although its goal is to learn the long-term dependency, but experiments show that learning and long-term preservation of information is difficult to achieve.

In order to solve the above problems, had the idea to increase network storage.
LSTM been proposed, referred to as a special cell of the memory cell, similar to the accumulator means and gating nerve.

Depth study of the future

Unsupervised learning is the focus of future research, because unsupervised learning dominate the human and animal learning process.
In the image field, combined with the depth of learning and enhanced learning is at the beginning.
In the future, NLP is an important area of influence of the depth of learning.
The final AI has made great progress from the combination of complex reasoning that the system of learning.

Reference

[1] "Deep Learning" Nature Journal

Guess you like

Origin blog.csdn.net/qq_19672707/article/details/91358941