Second, the logistic regression algorithm combing

1, associated with the difference between logistic regression and linear regression

  • Contact
    logistic regression and linear regression are all generalized linear regression models
  • Differences
    due to different variables, if it is continuous, that is, multiple linear regression, if a binomial distribution is logistic regression. logistic regression of the dependent variable can be binary, but also can be multi-classification, but the two classifications is more common, and more easily explained. So actually the most commonly used is the binary logistic regression.
    Linear regression was used to solve the problem of regression, logistic regression mainly used to solve classification problems

2, logistic regression principle

Logistic regression

3, the logistic regression loss function derivation and optimization

Now that can be converted into a probability, it will be able to take advantage of the maximum likelihood estimation (MLE) to write the objective function:

L (θ) = p (y⃗ | X; θ) = Πi = 1mp (yi | xi; θ) = Πi = 1m (hθ (xi)) yi (1-hθ (xi)) 1-yi
next MLE is solved old routine, logarithmic negative sign, and then substituted into hθ (x), the cost function can be obtained LR, i.e. the number of losses:

J (θ) = loss (hθ (xi), yi) = - 1ml (θ) = - 1mΣi = 1m (yiloghθ (xi) + (1-yi) log (1-hθ (xi)))
The sigmiod function substituting hθ (xi), and instead of using ln log, you can continue the above formula simplifies to:

J (i) = - 1mSi = 1m [yilnhth (xi) + (1-yi) ln (1-hth (xi)] = - 1mSi = 1m [yiln11 + e-thTxi + (1-yi) lne -thTxi1 + ethTxi] = - 1mSi = 1m [ln11 + ethTxi + yiln1e-thTxi] = 1mSi = 1m [-yithTxi + ln (1 + ethTxi)]

4, regularization and model assessment index

Regularization
Regularization is a form of regression coefficient estimates it (coefficient estimate) restrains towards zero, adjustment or reduction. In other words, regularization can reduce the complexity of the model and the degree of instability in the learning process, so as to avoid the risk of over-fitting.
L1 norm: L1 norm regularization in the process will tend to produce a small number of features, and other features are 0 (L1 parameters that will become sparse matrix). Thus L1 can not only play the role of regularization, it may also serve as a feature selection.
The L2 norm: norm L2 is attenuated by the weighting, and further characterized in that the impact overall decrease function to prevent overfitting role. L2 advantage is that solving stable and fast.
Model evaluation index
accuracy rate, precision = TP / (TP + FP) i.e. the number of positive and negative cases were correctly predicted / total number of
accuracy, accuracy = (TP + TN) / (TP + FP + TN + FN) accuracy rate easily and accurately rate can not be confused, as predicted there is a positive real number is positive. Understood as precision.
Recall, recall = TP / (TP + FN) show how much is actually in positive samples, the classifier can predict.
F1 Score = P * R / 2 (P + R), where P and R are recall and precision, recall and precision are required in the case is high, it can be used to measure F1 Score.
ROC curve inside logistic regression, for example of the definition of positive and negative, are usually set a threshold value greater than the threshold for positive class, the class is less than the threshold value is negative. If we reduce this threshold, more samples will be identified as positive class, improve the recognition rate of positive class, but it also makes more negative class is misidentified as positive class. For a visual representation of this phenomenon, the introduction of ROC. In the figure, the abscissa is the False Positive Rate (FPR false positive rate), the ordinate is the True Positive Rate (TPR real rate).
AUC (Area Under Curve) is defined as the area (integral ROC) under the ROC curve, typically greater than 1 less than 0.5. AUC value (area) of the classifier larger, the better the performance.

5, the advantages and disadvantages of logistic regression

The advantages of
a simple form and model interpretability is very good. Heavy can see the impact of different characteristics of the final results from the weight of the feature, the weight of a feature relatively high value, then the impact on the characteristics of the final result will be relatively large.
2 model works well, if the feature works to do good, the effect is not too bad.
3 training faster. Classification, and just only the number of calculation related features.
Shortcomings
1 accuracy is not very high. Because the form is very simple (very similar to the linear model), it is difficult to fit the actual distribution of the data.
2 difficult to deal with the problem of unbalanced data. For example: If we are positive and negative samples for a very uneven problems such as positive and negative samples than 10,000: 1. We put all the samples were positive predictive value of the loss function also enables relatively small. But as a classifier, its ability to distinguish between positive and negative samples is not very good.
3 nonlinear data is troublesome. Logistic regression without introduction of other methods, only linearly separable data processing

6, the sample is not a balanced solution to the problem

1) increasing the sample data less
machine learning is to use existing data to estimate the distribution of the entire data, and therefore more data can be obtained more often distribution information, and better distribution of the estimation. Even add subcategories sample data, and increase the categories of sample data.
2) resampling
the data samples subclass of sampling data samples to increase the number of subcategories, i.e., oversampling (over-sampling, the number of samples is greater than the number of such samples). That is a copy of the portion of the sample is added.
The number of categories of data samples are sampled to reduce the class of data samples, i.e. undersampled (under-sampling, the sampling times less than the sample class prime). That is, delete part of the sample.
3) the use of different classification algorithms
should use a different algorithm to compare because of the different algorithms used for different tasks and data. In the category tree is often uneven data performed well. Use it to create a classification tree based partitioning rule class variables, it is possible to forcibly separate samples of different categories.

7. sklearn parameters

LogisticRegressionttps

Guess you like

Origin www.cnblogs.com/robindong/p/11329118.html