Logistic regression -cancer

Data set: https://scikit-learn.org/stable/datasets/

特征(30个):
mean radius 569 non-null float64
mean texture 569 non-null float64
mean perimeter 569 non-null float64
mean area 569 non-null float64
mean smoothness 569 non-null float64
mean compactness 569 non-null float64
mean concavity 569 non-null float64
mean concave points 569 non-null float64
mean symmetry 569 non-null float64
mean fractal dimension 569 non-null float64
radius error 569 non-null float64
texture error 569 non-null float64
perimeter error 569 non-null float64
area error 569 non-null float64
smoothness error 569 non-null float64
compactness error 569 non-null float64
concavity error 569 non-null float64
concave points error 569 non-null float64
symmetry error 569 non-null float64
fractal dimension error 569 non-null float64
worst radius 569 non-null float64
worst texture 569 non-null float64
worst perimeter 569 non-null float64
worst area 569 non-null float64
worst smoothness 569 non-null float64
worst compactness 569 non-null float64
worst concavity 569 non-null float64
worst concave points 569 non-null float64
worst symmetry 569 non-null float64
worst fractal dimension 569 non-null float64
标签:
type 569 non-null int64

 

First, load the library

import numpy as np
import pandas as pd
from sklearn import datasets
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score
import matplotlib as mpl
import matplotlib.pyplot as plt
# 设置字体为黑体,以支持中文显示。
mpl.rcParams["font.family"] = "SimHei"
# 设置在中文字体时,能够正常的显示负号(-)。
mpl.rcParams["axes.unicode_minus"] = False

Second, data preprocessing

# 加载数据集
data = pd.read_csv(r"cancer.csv",header=0)
#data.sample(30)
#data.info()
# 查看是否含有异常值
#data.describe()
# 检查是否包含重复值
#data.duplicated().any()
# 如果有重复值,可以这样去除重复值
# data.drop_duplicates(inplace=True)

 

Third, call the method

# 将加载的数据集分为特征X与标签y。
X, y = data.iloc[:, :-1], data.iloc[:, -1]
#通过train_test_splil将数据分为训练集、测试集,测试集占0.25的比例
train_X, test_X, train_y, test_y = train_test_split(X, y, test_size=0.25, random_state=0)
#display(len(train_y))
#display(len(test_y))
#实例化线性回归模型
lr=LogisticRegression()
#训练模型
lr.fit(train_X,train_y)
#传入测试集进行测试
result=lr.predict(test_X)
#对模型进行评估

display(result)
display(test_y.values)
# 均方误差
print("Mean squared error: %.2f" % mean_squared_error(test_y, result))
# 方差分数: 1代表完美预测
print('Variance score: %.2f' % r2_score(test_y,result))

 Fourth, data visualization

# 绘制预测值
plt.plot(range(1,31),result[0:30], "ro", ms=15, label="预测值")
# 绘制真实值
plt.plot(range(1,31),test_y.values[0:30], "go", label="真实值")
plt.title("逻辑回归")
plt.xlabel("样本序号")
plt.ylabel("类别")
plt.legend()
plt.show()

 

Guess you like

Origin blog.csdn.net/weixin_42295205/article/details/91635151