Hands-on classification prediction of logistic regression (Logistic regression)

An introduction to logistic regression

1.1 Introduction to logistic regression Although logistic regression (LR for short) has the word "regression" in it, logistic regression is actually a classification model and is widely used in various fields.
Although deep learning is more popular than these traditional methods, these traditional methods are still widely used in various fields due to their unique advantages.
For logistic regression, the two most prominent points are the simplicity of the model and the strong interpretability of the model.
1.2 Advantages and disadvantages of logistic regression model:
Advantages : simple to implement, easy to understand and implement; low computational cost, fast speed, low storage resources;
Disadvantages : easy to under-fit, and the classification accuracy may not be high
1.3 Application of
logistic regression Logistic regression The model is clear and has a corresponding theoretical basis of probability.
The parameters it fits represent the influence of each feature on the result. Also a great tool for understanding data. But at the same time, because it is essentially a linear classifier , it cannot deal with more complex data situations. Many times we also use the logistic regression model as the baseline (basic level) for some task attempts.

2. Practical steps

2.1 Basic process
2.1.1 Constructing dataset
2.1.2 Calling logistic regression model
2.1.3 Fitting constructed dataset with logistic regression model
2.1.4 Visualizing data points + decision boundary
2.1.5 Using the training set and test set respectively The trained model makes predictions\

2.2 demo example:

# 基础函数库、画图库、逻辑回归模型函数
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.linear_model import LogisticRegression

# Demo演示LogisticRegression分类
# 构造数据集
x_fearures = np.array([[-1, -2], [-2, -1], [-3, -2], [1, 3], [2, 1], [3, 2]])

y_label = np.array([0, 0, 0, 1, 1, 1])

# 调用逻辑回归模型
lr_clf = LogisticRegression()
# 用逻辑回归模型拟合构造的数据集
lr_clf = lr_clf.fit(x_fearures, y_label)  # 其拟合方程为 y=w0+w1*x1+w2*x2
# 查看其对应模型的w
print('the weight of Logistic Regression:', lr_clf.coef_)
# 查看其对应模型的w0
print('the intercept(w0) of Logistic Regression:', lr_clf.intercept_)
# 可视化构造的数据样本点
plt.figure()
plt.scatter(x_fearures[:, 0], x_fearures[:, 1], c=y_label, s=50, cmap='viridis')
plt.title('Dataset')
plt.show()
# 可视化决策边界
plt.figure()
plt.scatter(x_fearures[:, 0], x_fearures[:, 1], c=y_label, s=50, cmap='viridis')
plt.title('Dataset')
#
nx, ny = 200, 100
x_min, x_max = plt.xlim()
y_min, y_max = plt.ylim()
x_grid, y_grid = np.meshgrid(np.linspace(x_min, x_max, nx), np.linspace(y_min, y_max, ny))
#
z_proba = lr_clf.predict_proba(np.c_[x_grid.ravel(), y_grid.ravel()])
z_proba = z_proba[:, 1].reshape(x_grid.shape)
plt.contour(x_grid, y_grid, z_proba, [0.5], linewidths=2., colors='blue')
plt.show()
# 可视化预测新样本
plt.figure()
# new point 1
x_fearures_new1 = np.array([[0, -1]])
plt.scatter(x_fearures_new1[:, 0], x_fearures_new1[:, 1], s=50, cmap='viridis')
# plt.annotate(s='New point 1', xy=(0, -1), xytext=(-2, 0), color='blue',
#              arrowprops=dict(arrowstyle='-|>', connectionstyle='arc3', color='red'))

# new point 2
x_fearures_new2 = np.array([[1, 2]])
plt.scatter(x_fearures_new2[:, 0], x_fearures_new2[:, 1], s=50, cmap='viridis')
# plt.annotate(s='New point 2', xy=(1, 2), xytext=(-1.5, 2.5), color='red',
#              arrowprops=dict(arrowstyle='-|>', connectionstyle='arc3', color='red'))

# 训练样本
plt.scatter(x_fearures[:, 0], x_fearures[:, 1], c=y_label, s=50, cmap='viridis')
plt.title('Dataset')

# 可视化决策边界
plt.contour(x_grid, y_grid, z_proba, [0.5], linewidths=2., colors='blue')

plt.show()

# 在训练集和测试集上分别利用训练好的模型进行预测
y_label_new1_predict = lr_clf.predict(x_fearures_new1)
y_label_new2_predict = lr_clf.predict(x_fearures_new2)

print('The New point 1 predict class:\n', y_label_new1_predict)
print('The New point 2 predict class:\n', y_label_new2_predict)

# 由于逻辑回归模型是概率预测模型（前文介绍的 p = p(y=1|x,\theta)）,所以我们可以利用 predict_proba 函数预测其概率
y_label_new1_predict_proba = lr_clf.predict_proba(x_fearures_new1)
y_label_new2_predict_proba = lr_clf.predict_proba(x_fearures_new2)

print('The New point 1 predict Probability of each class:\n', y_label_new1_predict_proba)
print('The New point 2 predict Probability of each class:\n', y_label_new2_predict_proba)
复制代码