Punch (it is for a limited time)

A linear regression
the linear regression may be the optimal solution by direct descent gradient and matrix formula, stochastic gradient descent can effectively reduce the amount of calculation, but also reduce the number of accuracy, and when the model loss function is relatively simple form, the above the error minimization problem solution can be directly expressed by the formula. Such solution is called analytic solution (analytical solution). Linear regression square error used in this section and just fall into this category. However, most deep learning model and analytical solutions not only to reduce the value of the loss function as much as possible by optimizing algorithm finite number of iterations model parameters. Such solution is called numerical solution (numerical solution).
Two, Softmax and classification model
is mainly used for classification
quadratic loss estimate
Loss = | y ^ (i) -y (i) | 2/2
Third, over-fitting, underfitting their solutions
over-fitting solutions the core is the use of punishment mechanism, it is commonly used in regular L2, L1 also regular.
Four, NLP, text preprocessing
text is a kind of sequence data, an article can be viewed as a sequence of characters or words, pre-treatment generally consists of four steps:
1 reads text
2 word
3 establish dictionary, each word map to a unique index (index)
. 4 converts the text word sequence from sequence indices for convenient input model
five, the NLP, language model
provides a natural language text can be viewed as a discrete time series, a given length of the TT sequence w1 word, w2, ..., wTw1, w2 , ..., target wT, language model is to evaluate the sequence is reasonable, that is, to calculate the probability of the sequence:
. P (w1, W2, ..., wT)
six cycles nerve The internet
The figure below shows how to implement language model based on Recurrent Neural Network. Our aim is a character-based input and current input sequence in the past to predict the sequence. Recurrent Neural Networks introducing a hidden variable H, H represents the value of the time step t with Ht. Ht is calculated based on the Xt and Ht-1, can be considered Ht record sequence information up to the current character, the next character by using Ht sequence was predicted.
----------------
Disclaimer: This article is the original article CSDN bloggers "E-va", and follow CC 4.0 BY-SA copyright agreement, reproduced, please attach the original source and link this statement.
Original link: https: //blog.csdn.net/weixin_43916772/article/details/104310447

Published an original article · won praise 0 · Views 16

Guess you like

Origin blog.csdn.net/weixin_43901214/article/details/104319319