数据科学作业3_鸢尾花分类

这是我去年选修数据科学时候的作业三,当时是肖若秀老师教的,但听说我们这届之后计科和物联信安一个难度授课了这篇文章可能也就只是自己记录帮不上学弟学妹了,但当时我上数据科学时候肖老师不签到老好了最后四个作业完成之后就有个还不错的分数虽然不出国选修课分数也无所谓就是了。

前文链接:

                数据科学作业1        

                数据科学作业2_房屋交易价格预测


目录

前文链接:

一、作业描述

二、作业过程

1.导入相关的库

2.读取数据

3.绘制  Violinplot

4.绘制pointplot

5.使用 Andrews Curves 将每个多变量观测值转换为曲线并表示傅立叶级数的系数,这对于检测时间序列数据中的异常值很有用。

6.线性回归的可视化

7.通过热图找出数据集中不同特征之间的相关性

8.机器学习

三、可视化结果

四、源代码附上

五、心得体会


一、作业描述

在本作业中,提供鸢尾花数据一套,数据为iris,包括150条记录,字段已经在课程上说明。本次作业旨在根据花瓣宽度、花瓣长度、花萼宽度、花萼长度4个特征实现对鸢尾花类别的精准预测,主要考察学生对于分类算法的理解和应用。

具体要求:

(1)选择合理的三分类拆解方式,在逻辑回归、k-NNSVM、决策树中实现2种分类器,合理的确定超参数,并选定合理的评价指标分析分类器性能。

(2)实现一种集成的分类器,并选定合理的评价指标分析分类器性能。 

 二、作业过程

1.导入相关的库

import numpy as np
import pandas as pd
from pandas import plotting

import matplotlib.pyplot as plt
plt.style.use('seaborn')

import seaborn as sns
sns.set_style("whitegrid")

from sklearn.linear_model import LogisticRegression 
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.neighbors import KNeighborsClassifier
from sklearn import svm
from sklearn import metrics 
from sklearn.tree import DecisionTreeClassifier

 2.读取数据

iris = pd.read_csv('iris.csv')

3.绘制  Violinplot

f, axes = plt.subplots(2, 2, figsize=(8, 8), sharex=True)
sns.despine(left=True)

sns.violinplot(x='targetname', y='sepal length (cm)', data=iris, palette=antV, ax=axes[0, 0])
sns.violinplot(x='targetname', y='sepal width (cm)', data=iris, palette=antV, ax=axes[0, 1])
sns.violinplot(x='targetname', y='petal length (cm)', data=iris, palette=antV, ax=axes[1, 0])
sns.violinplot(x='targetname', y='petal width (cm)', data=iris, palette=antV, ax=axes[1, 1])

plt.show()

4.绘制pointplot

f, axes = plt.subplots(2, 2, figsize=(8, 8), sharex=True)
sns.despine(left=True)

sns.pointplot(x='targetname', y='sepal length (cm)', data=iris, color=antV[0], ax=axes[0, 0])
sns.pointplot(x='targetname', y='sepal width (cm)', data=iris, color=antV[0], ax=axes[0, 1])
sns.pointplot(x='targetname', y='petal length (cm)', data=iris, color=antV[0], ax=axes[1, 0])
sns.pointplot(x='targetname', y='petal width (cm)', data=iris, color=antV[0], ax=axes[1, 1])


plt.show()

5.使用 Andrews Curves 将每个多变量观测值转换为曲线并表示傅立叶级数的系数,这对于检测时间序列数据中的异常值很有用。

plt.subplots(figsize = (10,8))
plotting.andrews_curves(iris, 'targetname', colormap='cool')

plt.show()

g = sns.lmplot(data=iris, x='sepal width (cm)', y='sepal length (cm)', palette=antV, hue='targetname')

6.线性回归的可视化

g = sns.lmplot(data=iris, x='sepal width (cm)', y='sepal length (cm)', palette=antV, hue='targetname')

g = sns.lmplot(data=iris, x='petal width (cm)', y='petal length (cm)', palette=antV, hue='targetname')

7.通过热图找出数据集中不同特征之间的相关性


fig=sns.heatmap(iris.corr(), annot=True, cmap='GnBu', linewidths=1, linecolor='k',
                square=True, mask=False, vmin=-1, vmax=1, cbar_kws={"orientation": "vertical"}, cbar=True)

8.机器学习

X = iris[['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']]
y = iris['targetname']


encoder = LabelEncoder()
y = encoder.fit_transform(y)
#print(y)

train_X, test_X, train_y, test_y = train_test_split(X, y, test_size = 0.3, random_state = 101)
#print(train_X.shape, train_y.shape, test_X.shape, test_y.shape)

# Support Vector Machine
model = svm.SVC()
model.fit(train_X, train_y)
prediction = model.predict(test_X)
print('The accuracy of the SVM is: {0}'.format(metrics.accuracy_score(prediction,test_y)))

# Logistic Regression
model = LogisticRegression()
model.fit(train_X, train_y)
prediction = model.predict(test_X)
print('The accuracy of the Logistic Regression is: {0}'.format(metrics.accuracy_score(prediction,test_y)))

# Decision Tree
model=DecisionTreeClassifier()
model.fit(train_X, train_y)
prediction = model.predict(test_X)
print('The accuracy of the Decision Tree is: {0}'.format(metrics.accuracy_score(prediction,test_y)))

# K-Nearest Neighbours
model=KNeighborsClassifier(n_neighbors=3)
model.fit(train_X, train_y)
prediction = model.predict(test_X)
print('The accuracy of the KNN is: {0}'.format(metrics.accuracy_score(prediction,test_y)))

四种方法的准确度:

The accuracy of the SVM is: 0.9777777777777777

The accuracy of the Logistic Regression is: 0.9777777777777777

The accuracy of the Decision Tree is: 0.9555555555555556

The accuracy of the KNN is: 1.0

 三、可视化结果

 

 

 

四、源代码附上:

import numpy as np
import pandas as pd
from pandas import plotting

#matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('seaborn')

import seaborn as sns
sns.set_style("whitegrid")

from sklearn.linear_model import LogisticRegression 
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.neighbors import KNeighborsClassifier
from sklearn import svm
from sklearn import metrics 
from sklearn.tree import DecisionTreeClassifier

iris = pd.read_csv('iris.csv')
#iris.info()


# 设置颜色主题
antV = ['#1890FF', '#2FC25B', '#FACC14', '#223273', '#8543E0', '#13C2C2', '#3436c7', '#F04864']

# 绘制  Violinplot
f, axes = plt.subplots(2, 2, figsize=(8, 8), sharex=True)
sns.despine(left=True)

sns.violinplot(x='targetname', y='sepal length (cm)', data=iris, palette=antV, ax=axes[0, 0])
sns.violinplot(x='targetname', y='sepal width (cm)', data=iris, palette=antV, ax=axes[0, 1])
sns.violinplot(x='targetname', y='petal length (cm)', data=iris, palette=antV, ax=axes[1, 0])
sns.violinplot(x='targetname', y='petal width (cm)', data=iris, palette=antV, ax=axes[1, 1])

plt.show()

f, axes = plt.subplots(2, 2, figsize=(8, 8), sharex=True)
sns.despine(left=True)

sns.pointplot(x='targetname', y='sepal length (cm)', data=iris, color=antV[0], ax=axes[0, 0])
sns.pointplot(x='targetname', y='sepal width (cm)', data=iris, color=antV[0], ax=axes[0, 1])
sns.pointplot(x='targetname', y='petal length (cm)', data=iris, color=antV[0], ax=axes[1, 0])
sns.pointplot(x='targetname', y='petal width (cm)', data=iris, color=antV[0], ax=axes[1, 1])


plt.show()

#g = sns.pairplot(data=iris, palette=antV, hue= 'targetname')

plt.subplots(figsize = (10,8))
plotting.andrews_curves(iris, 'targetname', colormap='cool')

plt.show()

g = sns.lmplot(data=iris, x='sepal width (cm)', y='sepal length (cm)', palette=antV, hue='targetname')

g = sns.lmplot(data=iris, x='petal width (cm)', y='petal length (cm)', palette=antV, hue='targetname')

fig=plt.gcf()
fig.set_size_inches(12, 8)


fig=sns.heatmap(iris.corr(), annot=True, cmap='GnBu', linewidths=1, linecolor='k',
                square=True, mask=False, vmin=-1, vmax=1, cbar_kws={"orientation": "vertical"}, cbar=True)


X = iris[['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']]
y = iris['targetname']


encoder = LabelEncoder()
y = encoder.fit_transform(y)
#print(y)

train_X, test_X, train_y, test_y = train_test_split(X, y, test_size = 0.3, random_state = 101)
#print(train_X.shape, train_y.shape, test_X.shape, test_y.shape)

# Support Vector Machine
model = svm.SVC()
model.fit(train_X, train_y)
prediction = model.predict(test_X)
print('The accuracy of the SVM is: {0}'.format(metrics.accuracy_score(prediction,test_y)))

# Logistic Regression
model = LogisticRegression()
model.fit(train_X, train_y)
prediction = model.predict(test_X)
print('The accuracy of the Logistic Regression is: {0}'.format(metrics.accuracy_score(prediction,test_y)))

# Decision Tree
model=DecisionTreeClassifier()
model.fit(train_X, train_y)
prediction = model.predict(test_X)
print('The accuracy of the Decision Tree is: {0}'.format(metrics.accuracy_score(prediction,test_y)))

# K-Nearest Neighbours
model=KNeighborsClassifier(n_neighbors=3)
model.fit(train_X, train_y)
prediction = model.predict(test_X)
print('The accuracy of the KNN is: {0}'.format(metrics.accuracy_score(prediction,test_y)))

五、心得体会

        通过对鸢尾花案例的学习,我初步了解到了机器学习的内容,感受到了这一门学科的魅力

猜你喜欢

转载自blog.csdn.net/weixin_48144018/article/details/124872333
今日推荐