LR-depth understanding of data collection

Asked LR interview today gradient algorithm and regularization term, do not understand their own, so find some relevant information, found in LR gradient descent algorithm, and the positive aspects of the sigmoid function has a deep study of it, also during find some good information, record it.

This paper deduced the relationship between the LR and maximum entropy model
http://www.win-vector.com/dfiles/LogisticRegressionMaxEnt.pdf

This article is a translation and understanding of the paper, to help reading papers.
https://blog.csdn.net/qq_32742009/article/details/81746955

The main conclusion: the binomial distribution of maximum entropy index is two maximum likelihood estimate.
Proof: Suppose x serving two exponential distribution, binomial distribution solution parameters, x is finally launched obey two exponential distribution, throughout the derivation process is a closed loop.

Doubt is: Why should we assume that x is serving two exponential distribution of it? (Note: two exponential distribution is the logistic distribution)
algorithm is generally assumed that the distribution is too, why the direct assumption is that it is too distributed?

Convex function defined
https://blog.csdn.net/feilong_csdn/article/details/83476277

LR L1 regularization of non-conductive proof:
https://blog.csdn.net/luoyexuge/article/details/79594554

Coordinate descent
https://blog.csdn.net/xiaocong1990/article/details/83039802

There is also a proximal end of the gradient descent method of solving:
https://www.zhihu.com/question/38426074/answer/76683857

Guess you like

Origin www.cnblogs.com/x739400043/p/11414650.html