鸢尾花——逻辑回归

现有鸢尾花数据集iris.csv。Iris数据集是常用的分类实验数据集,由Fisher, 1936收集整理。Iris也称鸢尾花卉数据集,是一类多重变量分析的数据集。数据集包含150个数据集,分为3类,每类50个数据,每个数据包含4个属性。可通过花萼长度,花萼宽度,花瓣

长度,花瓣宽度4个属性预测鸢尾花卉属于(Setosa,Versicolour,Virginica)三个种类中的哪一类。

具体要求:

  1. 使用逻辑回归模型训练鸢尾花数据集,测试集取20%,训练集取80%。
  2. 先对数据进行标准化后,分别采用多项式的次数为1-9进行训练,solver和multi_class请自行选择。
  3. 分别在控制台打印出多项式次数为1-9时,该模型在测试集上预测出准确分类的正确率。
import numpy as np
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler, PolynomialFeatures
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.pipeline import make_pipeline
if __name__ == "__main__":
    path = 'D://Ml_Lab_Data/iris.csv'  # 数据文件路径
    data = pd.read_csv(path, header=None)

    X, Y = np.split(data, (4,), axis=1)
    le = LabelEncoder()
    le.fit(Y)
    Y = le.transform(Y)

    X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, random_state=1)

    # 标准化特征值
    sc = StandardScaler()
    sc.fit(X_train)
    X_train_std = sc.transform(X_train)
    X_test_std = sc.transform(X_test)

    for i in range(1, 10):

        model = make_pipeline(PolynomialFeatures(degree=i),
                               LogisticRegression(solver='sag', multi_class='multinomial', max_iter=10000))
        model.fit(X_train, Y_train)
        acc = model.score(X_test_std, Y_test)
        print((i,acc * 100))

猜你喜欢

转载自blog.csdn.net/qq_38054219/article/details/89667943
今日推荐