The sklearn implementation of logistic regression

Article Directory

1. Import the necessary modules
2. Generate data
3. Model building
4. Model training
5. Model prediction
6.logistic regression model
7. Draw the prediction curve
8. Calculate the evaluation index accuracy

Text content:

1. Import the necessary modules

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

2. Generate data

2.1 Define the data generation function
def create_data(data_num=100):
    np.random.seed(21)
    x1=np.random.normal(1,0.2,data_num)
    x2=np.random.normal(2,0.2,data_num)
    x=np.append(x1,x2)
    y=np.array([0]*data_num+[1]*data_num)
    return x,y
2.2 Generate data
X,y=create_data(1000)
X #查看X的数据
array([0.98960715, 0.97776079, 1.20835936, ..., 1.84049108, 2.14936146,
       1.90338769])
y #查看y的数据
array([0, 0, 0, ..., 1, 1, 1])
2.3 Divide training set and test set
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test=train_test_split(
    X,y,test_size=0.3,random_state=16)
2.4 Draw a scatter plot of the training set data
plt.scatter(X_train,y_train,color='blue',s=20)
plt.show()

Training set scatter plot

2.5 Draw a scatter plot of the test set data
plt.scatter(X_test,y_test,color='g',s=20)
plt.show()

Scatter plot of test set data

3. Model building

from sklearn.linear_model import LogisticRegression
model=LogisticRegression()

4. Model training

  • Linear regression model training sklearn.linear_model.LogisticRegression.fit
  • Parameters used:
    —X: input feature, if the input is in np.array format, shape must be (n_sample,n_feature).
    -Y: Enter the label.
X_train=X_train.reshape(-1,1)
model.fit(X=X_train,y=y_train)
LogisticRegression() #上述两行代码运行的输出

5. Model prediction

  • Make predictions on the test set
  • Linear regression prediction model: sklearn.linear_model.LogisticRegression.predict
  • Parameters used:
    —X: input feature, if the input is in np.array format, shape must be (n_sample,n_feature).
    -C: Forecast result.
X_test=X_test.reshape(-1,1)
y_test_pred=model.predict(X=X_test)# 默认阀值为0.5
y_test_pred_proba=model.predict_proba(X=X_test) # 可以自定义阀值,比如自定义阀值0.6
Take the threshold to judge the two classification results of the probability
def thes_func(x):
    thes=0.6
    return 1 if x>thes else 0
y_test_pred_thes=list(map(thes_func,y_test_pred_proba[:,1]))

6. View the coefficient w and intercept b of the Logistic regression model

  • Regression coefficient: sklearn.linear_model.LogisticRegression.coef_
  • Intercept term: sklearn.linear_model.LogisticRegression.intercep_
w,b=model.coef_[0],model.intercept_
print('Weight={0}bias={1}'.format(w,b))
Weight=[9.53805539]bias=[-14.3705638]# print的输出结果

7. Draw the prediction curve

  • The scipy.special.expit function, also known as the logistic sigmoid function, is defined as: expit(x)=1/(1+ex)
  • Parameters:
    -x: the input of the sigmoid function, the input requirement is np.array array format.
    --Out: The output of the sigmoid function, returned in the format of np.array, with the same shape as the input x.
from scipy.special import expit
X_train=X_train.reshape(-1)
X_test=X_test.reshape(-1)
sigmoid=expit(np.sort(X_test)*model.coef_[0]+model.intercept_)
plt.plot(np.sort(X_test),sigmoid,color='g')
plt.scatter(X_test,y_test,color='r',label='test dataset')
plt.legend()
plt.show()

Insert picture description here

8. Calculate the evaluation index Accuracy

  • Mean square error: sklearn.metrics.accuracy_score
  • Parameters used:
    —y_true: ground_truth
    —y_pred: predicted value.
    Returns:
    -loss:accuracy calculation result.
from sklearn.metrics import accuracy_score
acc=accuracy_score(y_true=y_test,y_pred=y_test_pred)
print('Accuracy:{}'.format(acc))
Accuracy:0.9916666666666667 # print输出的结果

Guess you like

Origin blog.csdn.net/weixin_42961082/article/details/113805473