Data Science Assignment 3_Iris Classification

This is homework three when I took data science last year. It was taught by teacher Xiao Ruoxiu at the time, but I heard that after this year, the computer science and the Internet of Things will be taught with the same level of difficulty. This article may just be mere record. My sister, but when I was in data science, Mr. Xiao didn't sign in. It's okay. After the last four assignments, I got a pretty good score, even if I don't take elective courses abroad.

Previous link:

                Data Science Assignment 1        

                Data Science Assignment 2_ House Transaction Price Prediction


Table of contents

Previous link:

1. Job description

2. Operation process

1. Import related libraries

2. Read data

3. Draw Violinplot

4. Draw pointplot

5. Use Andrews Curves to convert each multivariate observation into a curve and represent the coefficients of the Fourier series, which is useful for detecting outliers in time series data.

6. Visualization of Linear Regression

7. Find the correlation between different features in the data set through the heat map

8. Machine Learning

3. Visualization results

Fourth, the source code is attached

5. Experience


1. Job description

In this assignment, a set of iris data is provided, the data is iris, including 150 records, and the fields have been explained in the course. The purpose of this assignment is to accurately predict the iris category based on the four characteristics of petal width, petal length, sepal width, and sepal length. It mainly examines students' understanding and application of classification algorithms.

Specific requirements:

(1 ) Choose a reasonable three-class disassembly method, implement two classifiers in logistic regression, k-NN , SVM , and decision tree, reasonably determine hyperparameters, and select reasonable evaluation indicators to analyze classifier performance.

(2) Realize an integrated classifier, and select a reasonable evaluation index to analyze the performance of the classifier. 

 2. Operation process

1. Import related libraries

import numpy as np
import pandas as pd
from pandas import plotting

import matplotlib.pyplot as plt
plt.style.use('seaborn')

import seaborn as sns
sns.set_style("whitegrid")

from sklearn.linear_model import LogisticRegression 
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.neighbors import KNeighborsClassifier
from sklearn import svm
from sklearn import metrics 
from sklearn.tree import DecisionTreeClassifier

 2. Read data

iris = pd.read_csv('iris.csv')

3. Draw Violinplot

f, axes = plt.subplots(2, 2, figsize=(8, 8), sharex=True)
sns.despine(left=True)

sns.violinplot(x='targetname', y='sepal length (cm)', data=iris, palette=antV, ax=axes[0, 0])
sns.violinplot(x='targetname', y='sepal width (cm)', data=iris, palette=antV, ax=axes[0, 1])
sns.violinplot(x='targetname', y='petal length (cm)', data=iris, palette=antV, ax=axes[1, 0])
sns.violinplot(x='targetname', y='petal width (cm)', data=iris, palette=antV, ax=axes[1, 1])

plt.show()

4. Draw pointplot

f, axes = plt.subplots(2, 2, figsize=(8, 8), sharex=True)
sns.despine(left=True)

sns.pointplot(x='targetname', y='sepal length (cm)', data=iris, color=antV[0], ax=axes[0, 0])
sns.pointplot(x='targetname', y='sepal width (cm)', data=iris, color=antV[0], ax=axes[0, 1])
sns.pointplot(x='targetname', y='petal length (cm)', data=iris, color=antV[0], ax=axes[1, 0])
sns.pointplot(x='targetname', y='petal width (cm)', data=iris, color=antV[0], ax=axes[1, 1])


plt.show()

5. Use Andrews Curves to convert each multivariate observation into a curve and represent the coefficients of the Fourier series, which is useful for detecting outliers in time series data.

plt.subplots(figsize = (10,8))
plotting.andrews_curves(iris, 'targetname', colormap='cool')

plt.show()

g = sns.lmplot(data=iris, x='sepal width (cm)', y='sepal length (cm)', palette=antV, hue='targetname')

6. Visualization of Linear Regression

g = sns.lmplot(data=iris, x='sepal width (cm)', y='sepal length (cm)', palette=antV, hue='targetname')

g = sns.lmplot(data=iris, x='petal width (cm)', y='petal length (cm)', palette=antV, hue='targetname')

7. Find the correlation between different features in the data set through the heat map


fig=sns.heatmap(iris.corr(), annot=True, cmap='GnBu', linewidths=1, linecolor='k',
                square=True, mask=False, vmin=-1, vmax=1, cbar_kws={"orientation": "vertical"}, cbar=True)

8. Machine Learning

X = iris[['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']]
y = iris['targetname']


encoder = LabelEncoder()
y = encoder.fit_transform(y)
#print(y)

train_X, test_X, train_y, test_y = train_test_split(X, y, test_size = 0.3, random_state = 101)
#print(train_X.shape, train_y.shape, test_X.shape, test_y.shape)

# Support Vector Machine
model = svm.SVC()
model.fit(train_X, train_y)
prediction = model.predict(test_X)
print('The accuracy of the SVM is: {0}'.format(metrics.accuracy_score(prediction,test_y)))

# Logistic Regression
model = LogisticRegression()
model.fit(train_X, train_y)
prediction = model.predict(test_X)
print('The accuracy of the Logistic Regression is: {0}'.format(metrics.accuracy_score(prediction,test_y)))

# Decision Tree
model=DecisionTreeClassifier()
model.fit(train_X, train_y)
prediction = model.predict(test_X)
print('The accuracy of the Decision Tree is: {0}'.format(metrics.accuracy_score(prediction,test_y)))

# K-Nearest Neighbours
model=KNeighborsClassifier(n_neighbors=3)
model.fit(train_X, train_y)
prediction = model.predict(test_X)
print('The accuracy of the KNN is: {0}'.format(metrics.accuracy_score(prediction,test_y)))

The accuracy of the four methods:

The accuracy of the SVM is: 0.9777777777777777

The accuracy of the Logistic Regression is: 0.9777777777777777

The accuracy of the Decision Tree is: 0.9555555555555556

The accuracy of the KNN is: 1.0

 3. Visualization results

 

 

 

Fourth, the source code is attached:

import numpy as np
import pandas as pd
from pandas import plotting

#matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('seaborn')

import seaborn as sns
sns.set_style("whitegrid")

from sklearn.linear_model import LogisticRegression 
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.neighbors import KNeighborsClassifier
from sklearn import svm
from sklearn import metrics 
from sklearn.tree import DecisionTreeClassifier

iris = pd.read_csv('iris.csv')
#iris.info()


# 设置颜色主题
antV = ['#1890FF', '#2FC25B', '#FACC14', '#223273', '#8543E0', '#13C2C2', '#3436c7', '#F04864']

# 绘制  Violinplot
f, axes = plt.subplots(2, 2, figsize=(8, 8), sharex=True)
sns.despine(left=True)

sns.violinplot(x='targetname', y='sepal length (cm)', data=iris, palette=antV, ax=axes[0, 0])
sns.violinplot(x='targetname', y='sepal width (cm)', data=iris, palette=antV, ax=axes[0, 1])
sns.violinplot(x='targetname', y='petal length (cm)', data=iris, palette=antV, ax=axes[1, 0])
sns.violinplot(x='targetname', y='petal width (cm)', data=iris, palette=antV, ax=axes[1, 1])

plt.show()

f, axes = plt.subplots(2, 2, figsize=(8, 8), sharex=True)
sns.despine(left=True)

sns.pointplot(x='targetname', y='sepal length (cm)', data=iris, color=antV[0], ax=axes[0, 0])
sns.pointplot(x='targetname', y='sepal width (cm)', data=iris, color=antV[0], ax=axes[0, 1])
sns.pointplot(x='targetname', y='petal length (cm)', data=iris, color=antV[0], ax=axes[1, 0])
sns.pointplot(x='targetname', y='petal width (cm)', data=iris, color=antV[0], ax=axes[1, 1])


plt.show()

#g = sns.pairplot(data=iris, palette=antV, hue= 'targetname')

plt.subplots(figsize = (10,8))
plotting.andrews_curves(iris, 'targetname', colormap='cool')

plt.show()

g = sns.lmplot(data=iris, x='sepal width (cm)', y='sepal length (cm)', palette=antV, hue='targetname')

g = sns.lmplot(data=iris, x='petal width (cm)', y='petal length (cm)', palette=antV, hue='targetname')

fig=plt.gcf()
fig.set_size_inches(12, 8)


fig=sns.heatmap(iris.corr(), annot=True, cmap='GnBu', linewidths=1, linecolor='k',
                square=True, mask=False, vmin=-1, vmax=1, cbar_kws={"orientation": "vertical"}, cbar=True)


X = iris[['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']]
y = iris['targetname']


encoder = LabelEncoder()
y = encoder.fit_transform(y)
#print(y)

train_X, test_X, train_y, test_y = train_test_split(X, y, test_size = 0.3, random_state = 101)
#print(train_X.shape, train_y.shape, test_X.shape, test_y.shape)

# Support Vector Machine
model = svm.SVC()
model.fit(train_X, train_y)
prediction = model.predict(test_X)
print('The accuracy of the SVM is: {0}'.format(metrics.accuracy_score(prediction,test_y)))

# Logistic Regression
model = LogisticRegression()
model.fit(train_X, train_y)
prediction = model.predict(test_X)
print('The accuracy of the Logistic Regression is: {0}'.format(metrics.accuracy_score(prediction,test_y)))

# Decision Tree
model=DecisionTreeClassifier()
model.fit(train_X, train_y)
prediction = model.predict(test_X)
print('The accuracy of the Decision Tree is: {0}'.format(metrics.accuracy_score(prediction,test_y)))

# K-Nearest Neighbours
model=KNeighborsClassifier(n_neighbors=3)
model.fit(train_X, train_y)
prediction = model.predict(test_X)
print('The accuracy of the KNN is: {0}'.format(metrics.accuracy_score(prediction,test_y)))

5. Experience

        Through the study of the iris case, I have a preliminary understanding of the content of machine learning and feel the charm of this subject

Guess you like

Origin blog.csdn.net/weixin_48144018/article/details/124872333