现有鸢尾花数据集iris.csv。Iris数据集是常用的分类实验数据集,由Fisher, 1936收集整理。Iris也称鸢尾花卉数据集,是一类多重变量分析的数据集。数据集包含150个数据集,分为3类,每类50个数据,每个数据包含4个属性。可通过花萼长度,花萼宽度,花瓣
长度,花瓣宽度4个属性预测鸢尾花卉属于(Setosa,Versicolour,Virginica)三个种类中的哪一类。
具体要求:
- 使用逻辑回归模型训练鸢尾花数据集,测试集取20%,训练集取80%。
- 先对数据进行标准化后,分别采用多项式的次数为1-9进行训练,solver和multi_class请自行选择。
- 分别在控制台打印出多项式次数为1-9时,该模型在测试集上预测出准确分类的正确率。
import numpy as np
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler, PolynomialFeatures
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.pipeline import make_pipeline
if __name__ == "__main__":
path = 'D://Ml_Lab_Data/iris.csv' # 数据文件路径
data = pd.read_csv(path, header=None)
X, Y = np.split(data, (4,), axis=1)
le = LabelEncoder()
le.fit(Y)
Y = le.transform(Y)
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, random_state=1)
# 标准化特征值
sc = StandardScaler()
sc.fit(X_train)
X_train_std = sc.transform(X_train)
X_test_std = sc.transform(X_test)
for i in range(1, 10):
model = make_pipeline(PolynomialFeatures(degree=i),
LogisticRegression(solver='sag', multi_class='multinomial', max_iter=10000))
model.fit(X_train, Y_train)
acc = model.score(X_test_std, Y_test)
print((i,acc * 100))