Disclaimer: This article is a blogger original article, shall not be reproduced without the bloggers allowed. https://blog.csdn.net/wangfenghui132/article/details/77741325

Learning depth offset (threshold value) is not involved in regularization. This time I think of logistic regression and svm.

svm and logistic regression were used for classification and machine learning on the angle to talk (to not from a statistical analysis) who are looking for a decision hyperplane. However, a variety of expression hyperplane, die expression vectors of different length method is different. So if you want to find in the form of a final decision hyperplane, die length necessary to define normal vectors. Svm the norm of the vector in the method for a specific length setting value - the minimum absolute value of a function of distance. By the method of Lagrange multipliers which eventually added to the Lagrangian function. In logistic regression, if you want to obtain a final decision hyperplane expression should also set the mold long vector law c. It may be the same constraints and objective function merge by Lagrange multipliers. cost = lost + a (w * wc * c), where a is greater than zero. You can refer to a particular push Lagrange multipliers. In fact, this is what we often see the way the cost function, the above cost and lost two functions represent only two function names. In the logistic regression statistical sample which is assumed to conform to Bernoulli distribution, Bernoulli distribution conjugate prior distribution is beta] distribution, distribution weights beta] w is in base position, like a Gaussian distribution as in the index position, even with the negative logarithm we can not derive the quadratic norm regularization term a (w * wc * c) statistical significance. That secondary norm logistic regression regularization terms can not be explained statistically. But then again, there's a secondary linear regression paradigm introduced regularization term performance, although statistically the prior distribution, but yet there is little contradiction in seeking the unique nature of a fit curve. Such ax + b = y as a fitting curve, this time by default ax + by = 0 curve of such a fixed form, this time in front of a normal vector y constrained coefficients by -1, not just changed, It has been the only form of expression, then the only think to find a solution from the point of view there is no need of a constrained quadratic paradigm w. So there would be no regularization of the second paradigm. So inside by linear regression explained through statistical classification of things which can not be explained, seeking to explain ideas only decision-making through the surface classification (logistic regression) was unreasonable interpretation of linear regression of secondary paradigm regularization term.

svm and logistic regression were used for classification and machine learning on the angle to talk (to not from a statistical analysis) who are looking for a decision hyperplane. However, a variety of expression hyperplane, die expression vectors of different length method is different. So if you want to find in the form of a final decision hyperplane, die length necessary to define normal vectors. Svm the norm of the vector in the method for a specific length setting value - the minimum absolute value of a function of distance. By the method of Lagrange multipliers which eventually added to the Lagrangian function. In logistic regression, if you want to obtain a final decision hyperplane expression should also set the mold long vector law c. It may be the same constraints and objective function merge by Lagrange multipliers. cost = lost + a (w * wc * c), where a is greater than zero. You can refer to a particular push Lagrange multipliers. In fact, this is what we often see the way the cost function, the above cost and lost two functions represent only two function names. In the logistic regression statistical sample which is assumed to conform to Bernoulli distribution, Bernoulli distribution conjugate prior distribution is beta] distribution, distribution weights beta] w is in base position, like a Gaussian distribution as in the index position, even with the negative logarithm we can not derive the quadratic norm regularization term a (w * wc * c) statistical significance. That secondary norm logistic regression regularization terms can not be explained statistically. But then again, there's a secondary linear regression paradigm introduced regularization term performance, although statistically the prior distribution, but yet there is little contradiction in seeking the unique nature of a fit curve. Such ax + b = y as a fitting curve, this time by default ax + by = 0 curve of such a fixed form, this time in front of a normal vector y constrained coefficients by -1, not just changed, It has been the only form of expression, then the only think to find a solution from the point of view there is no need of a constrained quadratic paradigm w. So there would be no regularization of the second paradigm. So inside by linear regression explained through statistical classification of things which can not be explained, seeking to explain ideas only decision-making through the surface classification (logistic regression) was unreasonable interpretation of linear regression of secondary paradigm regularization term.