Based on the iris data set, the perceptron is used to implement linear binary classification.

Based on the iris data set, the perceptron is used to implement linear binary classification.

Brief description of the topic:

Based on the iris data set (four features, three categories), select two features and two categories, use the perceptron Perceptron to implement linear binary classification, visualize the classification results, and draw the relationship between the number of iterations and the accuracy rate. Note : The code implements pre-analysis of data and explains the reasons for the selection of two features and two categories.


iris data set:

The link is here: Introduction to iris data set


Code usage introduction:

  • Here sample1=0, sample2=1 means that among the four features of iris, I selected the first feature and the second feature, namely the sepal length and sepal width features in the picture.


  • The four characteristics or three categories of iris are as follows

Insert image description here


  • Next, run the following code and get the following results:

    draw_relation(df,sample1,sample2)
    

    image-20221109235709430

    As can be seen from the above figure, Category 0 and Category 1 are linearly separable, and Category 0 and Category 2 are linearly separable. Next, we only need to select two of the separable categories for classification. Here I choose Distinguish between Category 0 and Category 1


  • Since we are distinguishing between class 0 and class 1, I set class1=0 and class2=1 in the trains() function to indicate the class I want to choose. After running, you get the following results:

trains(df,sample1,sample2,class1=0,class2=1,epoch=400)

Insert image description here

Insert image description here
Insert image description here


Classify again:

  • At this time, if you need to select other features, such as sample1=1, sample=2, you will get the following results after running. It can be seen that category 0 and category 1 are linearly separable, and category 0 and category 2 are linearly separable.

    • draw_relation(df,sample1,sample2)
      


  • From the above results, I chose class1=0, class2=2 to classify them, and got the following results:

    • trains(df,sample1,sample2,class1=0,class2=1,epoch=400)
      

image-20221110001422782

Insert image description here

image-20221110001521592


code show as below:

import pandas as pd
import numpy as np
from pylab import *
from matplotlib import pyplot as plt
from sklearn.datasets import load_iris
from sklearn.linear_model import Perceptron
from sklearn.model_selection import train_test_split



# 通过图像的方式得到可分类的特征,默认为第0,1类特征
def draw_relation(df,sample1=0,sample2=1):
    x = np.array(df.iloc[:, [sample1, sample2]])
    y = np.array(df.iloc[:, -1])
    lenth = len(x)
    n11 = [x[i, 0] for i in range(lenth) if y[i] == 0]
    n12 = [x[i, 1] for i in range(lenth) if y[i] == 0]
    n21 = [x[i, 0] for i in range(lenth) if y[i] == 1]
    n22 = [x[i, 1] for i in range(lenth) if y[i] == 1]
    n31 = [x[i, 0] for i in range(lenth) if y[i] == 2]
    n32 = [x[i, 1] for i in range(lenth) if y[i] == 2]
    mpl.rcParams['font.sans-serif'] = ['SimHei']  # 添加这条可以让图形显示中文
    plt.figure()
    plt.xlim(0, 8)
    plt.ylim(0, 8)
    plt.title(f"特征{sample1}和特征{sample2}的三个种类分布")
    plt.xlabel(f'{df.columns[sample1]}')
    plt.ylabel(f'{df.columns[sample2]}')
    plt.scatter(n11, n12, label='0', c='r', marker='v')
    plt.scatter(n21, n22, label='1', c='b', marker='x')
    plt.scatter(n31, n32, label='2', c='g',marker='o')
    plt.legend('012')
    plt.show()
#对数据类别进行选择并训练
def trains(dfs,sample1=0,sample2=1,class1=0,class2=1,epoch=50):
    data1 = np.array(df.iloc[50*class1:50*class1+50, [sample1, sample2, -1]])
    data2 = np.array(df.iloc[50*class2:50*class2+50, [sample1, sample2, -1]])
    #将选择的两个类的数据合并在一起
    data = np.concatenate((data1, data2), axis = 0)
    x, y = data[:, :-1], data[:, -1]
    lenth = len(x)
    # 将选择的两个类分为正例和反例
    positive_x1 = [x[i, 0] for i in range(lenth) if y[i] == class1]
    positive_x2 = [x[i, 1] for i in range(lenth) if y[i] == class1]
    negetive_x1 = [x[i, 0] for i in range(lenth) if y[i] == class2]
    negetive_x2 = [x[i, 1] for i in range(lenth) if y[i] == class2]
    # 测试集和训练集都用所有的数据
    x_data_train,y_data_train = x,y
    x_data_test,y_data_test = x,y
    # x_data_train, x_data_test, y_data_train, y_data_test = train_test_split(x, y, test_size=0.4,random_state=False)
    # 用于存储损失结果
    y_data = []
    x_data = []
    # 中文
    mpl.rcParams['font.sans-serif'] = ['SimHei']
    plt.figure()
    # 将图表和颜色设定为与draw_relation()画出的图像一致
    class_pic = ['v','x','o']
    class_color = ['r','b','g']
    for i in range(1, epoch, 5):
        # 使用Perceptron进行训练
        clf = Perceptron(fit_intercept=False, n_iter_no_change=i, shuffle=False)
        # 使用训练数据进行训练
        clf.fit(x_data_train, y_data_train)
        # 得到训练结果
        print("训练结果w:", clf.coef_)  # w 参数
        print("训练结果b:", clf.intercept_)  # b 参数
        # clf.n_iter_ #迭代次数
        print("迭代次数:", clf.n_iter_)

        # 使用测试集进行验证
        acc = clf.score(x_data_test, y_data_test)
        print("测试集评估:", acc)
        x_data.append(i)
        y_data.append(acc)

        # 画出正例和反例的散点图
        plt.cla()
        plt.title(f"第{class1}类和{class2}类的超平面和散点图")
        plt.xlabel(f'{df.columns[sample1]}')
        plt.ylabel(f'{df.columns[sample2]}')
        plt.xlim(0, 8)
        plt.ylim(0, 8)
        plt.scatter(positive_x1, positive_x2, c=class_color[class1],marker=class_pic[class1])
        plt.scatter(negetive_x1, negetive_x2, c=class_color[class2],marker=class_pic[class2])
        # 画出超平面(在本例中即是一条直线)
        line_x = np.arange(2, 8)
        line_y = line_x * (-clf.coef_[0][0] / clf.coef_[0][1]) - clf.intercept_ / clf.coef_[0][1]
        plt.plot(line_x, line_y)
        plt.legend(f'线{class1}{class2}')
        plt.pause(0.005)
    # 显示标签
    plt.figure()
    plt.plot(x_data, y_data, 'o-',color="g", alpha=0.8, linewidth=0.8)
    plt.title("迭代次数与正确率", fontsize=10)
    plt.xlabel('迭代次数')
    plt.ylabel('正确率')
    x_ticks = np.arange(0, 401, 50)
    y_ticks = np.arange(0.5, 1.01, 0.05)
    plt.xticks(x_ticks)
    plt.yticks(y_ticks)
    plt.show()

if __name__ == '__main__':
    #加载鸢尾花数据集
    iris = load_iris()
    df = pd.DataFrame(iris.data, columns=iris.feature_names)  # iris.data包含一个(150, 4)的数据,设置列名为iris.feature_names
    df['label'] = iris.target  # iris.target为类别标签(150, 1)
    #此处sample1=0,sample2=1表示鸢尾花的四个特征中,我选取第一个特征和第二个特征,即
    sample1 = 0
    sample2 = 1
    #对数据进行分析,将trans()函数注释一下,通过观察draw_relation()函数得到的3个类的分布,对接下来的类进行选取
    draw_relation(df,sample1,sample2)
    #对数据类别进行选择并训练
    trains(df,sample1,sample2,class1=0,class2=1,epoch=400)


Guess you like

Origin blog.csdn.net/qq_60943902/article/details/127781653