Chapter 5. Logistic regression: 1. Why is logistic regression needed? 1. Logistic Regression in sklearn 1.1 LogisticRegression

1. Why is logistic regression needed?

The fitting effect of the linear relationship is very good, the calculation speed is fast, and the classification is not a fixed 0, 1, but a class probability number.
Strong noise immunity.

1. Logistic regression in sklearn

Insert image description here
Evaluation category:
Insert image description here

1.1 LogisticRegression

parameter:
Insert image description here

  1. Penalty: l1 or l2, indicating which regular method is used. The default is l2. If it is l1, the parameter solver can only use liblinear. But L1 regularization will compress the parameters to 0, and L2 regularization will only make the parameters as small as possible.
  2. C: Must be a floating point number greater than 0, default is 0.1. The smaller c is, the stronger the regularization effect is and the heavier the penalty on the loss function.

We use the "loss function" evaluation index to measure the quality of parameter p, that is, whether this set of parameters can make the model perform well on the training set. **If the model performs well on the training set after modeling with a set of parameters, then we say that the pattern of model performance is consistent with the pattern of the training set data, the loss during the fitting process is very small, and the value of the loss function is very small. , this set of parameters is excellent; on the contrary, if the model performs poorly on the training set, the loss function will be large, the model will be undertrained, and the effect will be poor, and this set of parameters will be poor. That is to say, when we solve the parameter p, we pursue the minimum loss function so that the model has the best fitting effect on the training data, that is, the prediction accuracy is as close to 100% as possible.

  • The loss function is as follows:
    Insert image description here
    Since we pursue the minimum value of the loss function so that the model performs optimally on the training set, another problem may arise: **If the model performs well on the training set but performs poorly on the test set, the model It will be overfitting. **Although logistic regression and linear regression are inherently under-fitting models, we still need techniques to control over-fitting to help us adjust the model. Control of over-fitting in logistic regression is achieved through regularization.
  • Regularization:
    Insert image description here
    The first is the mean of the sum of absolute values, the second is the mean of the sum of squares and roots.

Code:

from sklearn.liner_model import LogisticRegression as lr
from sklearn.datasets import load_breast_cancer as lbc
import numpy as np
import matplotlib.pyplot as pt
from sklearn.model_selection import train_test_split as ttp
from sklean.metrics import accuracy_score as acs

data=lbc()//字典
x=data.data
y=data.target
print(s.shape) //569,30

lr1=lr(penalty="l1",solver="liblinear",C=0.5,max_iter=1000)
lr2=lr(penalty="l2",solver="liblinear",C=0.5,max_iter=1000)

lr1=lr1.fit(x,y)
print(lr1.coef_)//p
print(lr1.coef_!=0).sum(axis=1)//p不等于0的个数

lr2=lr2.fit(x,y)
print(lr2.coef_)//不会出现0

Which one works better?
Insert image description here
Insert image description here
The abscissa is the value of c, which can be obtained from the figure. In this figure, l2 is better.

Guess you like

Origin blog.csdn.net/qq_53982314/article/details/131260735