12. Eight commonly used machine learning classification algorithms in practice --- predict the grade through six factors

1. Demand Analysis

According to the previously trained model, test the standard sample cards of different grades.
There are 48 test samples, including 起球个数、起球总面积、起球最大面积、起球平均面积、对比度、光学体积six indicators, and finally determine the general structure of the fabric 等级
data set as follows: ( The data set is collected by my own test, here I will not share it publicly, personal data, long live understanding ) Notes on csv format: there is no space at the end and there is a space after 1 , pay attention! ! !fiber.csv

insert image description here

N,S,Max_s,Aver_s,C,V,Grade
27,111542.5,38299.5,4131.2,31.91,3559537.61,1(空格)

variable meaning
N Number of pills
S total pilling area
Max_s Maximum Pilling Area
Aver_s Pilling average area
C contrast
V optical volume
Grade final grade

2. Try a variety of methods to achieve the forecast rating

1. Guide package

pip install scikit-learnInstall sklearn related packages

import numpy as np
import pandas as pd 
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
 
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
 
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.naive_bayes import BernoulliNB
from sklearn.naive_bayes import GaussianNB
from sklearn.naive_bayes import MultinomialNB
from sklearn.svm import LinearSVC
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier
 
from sklearn.metrics import confusion_matrix, classification_report, accuracy_score

2. Read the display data set

fiber = pd.read_csv("./fiber.csv")
fiber.head(15)

insert image description here

print(fiber)
"""
     N         S    Max_s    Aver_s      C           V  Grade
0   27  111542.5  38299.5   4131.20  31.91  3559537.61      1
1   27  110579.5  31220.0   3186.63  31.28  2690869.73      1
......
47   9   33853.0   6329.0   3761.44  41.17  1393863.42      4
"""

3. Divide the dataset

The last column is the outcome, and the remaining six factors are independent variables

X = fiber.drop(['Grade'], axis=1)
Y = fiber['Grade']

Divide the data set into two parts, the verification set and the test set
random_state, to ensure that the training set and test set are the same each time

X_train, X_test, y_train, y_test = train_test_split(X, Y, random_state=0)

Check the shape value.
There are 36 training sets and 12 test sets. There are 48 data in total.

print(X_test.shape) #(36, 6)
print(y_train.shape) #(36,)
print(X_test.shape) #(12, 6)

4. Different algorithm fitting

①K nearest neighbor algorithm, KNeighborsClassifier()

n_neighbors : Select the number of nearest points
to fit other data through these 4 data

knn = KNeighborsClassifier(n_neighbors=4)

Train the fit on the training set

knn.fit(X_train,y_train)

Predict the test set X_test and get the prediction result y_pred

y_pred = knn.predict(X_test)

Compare the predicted result y_pred with the correct answer y_test , find the mean mean , and see the accuracy

accuracy = np.mean(y_pred==y_test)
print(accuracy)

Also see the final score

score = knn.score(X_test,y_test)
print(score)

Randomly generate a piece of data to test the model,
16,18312.5,6614.5,2842.31,25.23,1147430.19,2
the final level is 2
insert image description here

test = np.array([[16,18312.5,6614.5,2842.31,25.23,1147430.19]])
prediction = knn.predict(test)
print(prediction)
"""
[2]
"""

This is extracted from the training set, and it must not be done in practice. It is just for testing.

K nearest neighbor algorithm complete code

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import confusion_matrix, classification_report, accuracy_score


fiber = pd.read_csv("./fiber.csv")
# 划分自变量和因变量
X = fiber.drop(['Grade'], axis=1)
Y = fiber['Grade']
#划分数据集
X_train, X_test, y_train, y_test = train_test_split(X, Y, random_state=0)

knn = KNeighborsClassifier(n_neighbors=4)
knn.fit(X_train,y_train)
y_pred = knn.predict(X_test)#模型预测结果
accuracy = np.mean(y_pred==y_test)#准确度
score = knn.score(X_test,y_test)#得分
print(accuracy)
print(score)

#测试
test = np.array([[16,18312.5,6614.5,2842.31,25.23,1147430.19]])#随便找的一条数据
prediction = knn.predict(test)#带入数据,预测一下
print(prediction)

②Logistic regression algorithm, LogisticRegression()

Instantiate a logistic regression object

lr = LogisticRegression()

Pass in the training set for training fitting

lr.fit(X_train,y_train)#模型拟合

Predict the test set X_test and get the prediction result y_pred

y_pred = lr.predict(X_test)#模型预测结果

Compare the predicted result y_pred with the correct answer y_test , find the mean mean , and see the accuracy

accuracy = np.mean(y_pred==y_test)
print(accuracy)

Also see the final score

score = lr.score(X_test,y_test)
print(score)

Randomly generate a piece of data to test the model,
20,44882.5,10563,5623.88,27.15,3053651.65,1
the final level is 1
insert image description here

test = np.array([[20,44882.5,10563,5623.88,27.15,3053651.65]])#随便找的一条数据,正确等级为1
prediction = lr.predict(test)#带入数据,预测一下
print(prediction)
"""
[1]
"""

This is extracted from the training set, and it must not be done in practice. It is just for testing.

Logistic regression complete code

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression


fiber = pd.read_csv("./fiber.csv")
# 划分自变量和因变量
X = fiber.drop(['Grade'], axis=1)
Y = fiber['Grade']
#划分数据集
X_train, X_test, y_train, y_test = train_test_split(X, Y, random_state=0)

lr = LogisticRegression()
lr.fit(X_train,y_train)#模型拟合
y_pred = lr.predict(X_test)#模型预测结果
accuracy = np.mean(y_pred==y_test)#准确度
score = lr.score(X_test,y_test)#得分
print(accuracy)
print(score)

test = np.array([[20,44882.5,10563,5623.88,27.15,3053651.65]])#随便找的一条数据
prediction = lr.predict(test)#带入数据,预测一下
print(prediction)

③Linear support vector machine, LinearSVC()

Instantiate a linear SVM object

lsvc = LinearSVC()

Pass in the training set for training fitting

lsvc.fit(X_train,y_train)#模型拟合

Predict the test set X_test and get the prediction result y_pred

y_pred = lsvc.predict(X_test)#模型预测结果

Compare the predicted result y_pred with the correct answer y_test , find the mean mean , and see the accuracy

accuracy = np.mean(y_pred==y_test)
print(accuracy)

Also see the final score

score = lsvc.score(X_test,y_test)
print(score)

Randomly generate a piece of data to test the model,
20,55997.5,17644.5,2799.88,8.58,480178.56,2
the final level is 2
insert image description here

test = np.array([[20,55997.5,17644.5,2799.88,8.58,480178.56]])#随便找的一条数据
prediction = lsvc.predict(test)#带入数据,预测一下
print(prediction)
"""
[2]
"""

This is extracted from the training set, and it must not be done in practice. It is just for testing.

Linear support vector machine complete code

from sklearn.svm import LinearSVC
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split

fiber = pd.read_csv("./fiber.csv")
# 划分自变量和因变量
X = fiber.drop(['Grade'], axis=1)
Y = fiber['Grade']
#划分数据集
X_train, X_test, y_train, y_test = train_test_split(X, Y, random_state=0)

lsvc = LinearSVC()
lsvc.fit(X_train,y_train)#模型拟合
y_pred = lsvc.predict(X_test)#模型预测结果
accuracy = np.mean(y_pred==y_test)#准确度
score = lsvc.score(X_test,y_test)#得分
print(accuracy)
print(score)

test = np.array([[20,55997.5,17644.5,2799.88,8.58,480178.56]])#随便找的一条数据
prediction = lsvc.predict(test)#带入数据,预测一下
print(prediction)

④Support vector machine, SVC()

Instantiate the SVM object

svc = SVC()

Pass in the training set for training fitting

svc.fit(X_train,y_train)#模型拟合

Predict the test set X_test and get the prediction result y_pred

y_pred = svc.predict(X_test)#模型预测结果

Compare the predicted result y_pred with the correct answer y_test , find the mean mean , and see the accuracy

accuracy = np.mean(y_pred==y_test)
print(accuracy)

Also see the final score

score = svc.score(X_test,y_test)
print(score)

Randomly generate a piece of data to test the model,
23,97215.5,22795.5,2613.09,29.72,1786141.62,1
the final level is 1
insert image description here

test = np.array([[23,97215.5,22795.5,2613.09,29.72,1786141.62]])#随便找的一条数据
prediction = svc.predict(test)#带入数据,预测一下
print(prediction)
"""
[1]
"""

This is extracted from the training set, and it must not be done in practice. It is just for testing.

Support vector machine complete code

from sklearn.svm import SVC
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split

fiber = pd.read_csv("./fiber.csv")
# 划分自变量和因变量
X = fiber.drop(['Grade'], axis=1)
Y = fiber['Grade']
#划分数据集
X_train, X_test, y_train, y_test = train_test_split(X, Y, random_state=0)

svc = SVC(gamma='auto')
svc.fit(X_train,y_train)#模型拟合
y_pred = svc.predict(X_test)#模型预测结果
accuracy = np.mean(y_pred==y_test)#准确度
score = svc.score(X_test,y_test)#得分
print(accuracy)
print(score)

test = np.array([[23,97215.5,22795.5,2613.09,29.72,1786141.62]])#随便找的一条数据
prediction = svc.predict(test)#带入数据,预测一下
print(prediction)

⑤Decision tree, DecisionTreeClassifier()

Did you find out that the first four method steps are almost the same, but the instantiated objects are different, that's all, so I won't repeat them here.

Randomly generate a piece of data to test the model,
11,99498,5369,9045.27,28.47,3827588.56,4
the final level is 4
insert image description here

Decision tree complete code

from sklearn.tree import DecisionTreeClassifier
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split

fiber = pd.read_csv("./fiber.csv")
# 划分自变量和因变量
X = fiber.drop(['Grade'], axis=1)
Y = fiber['Grade']
#划分数据集
X_train, X_test, y_train, y_test = train_test_split(X, Y, random_state=0)

dtc = DecisionTreeClassifier()
dtc.fit(X_train,y_train)#模型拟合
y_pred = dtc.predict(X_test)#模型预测结果
accuracy = np.mean(y_pred==y_test)#准确度
score = dtc.score(X_test,y_test)#得分
print(accuracy)
print(score)

test = np.array([[11,99498,5369,9045.27,28.47,3827588.56]])#随便找的一条数据
prediction = dtc.predict(test)#带入数据,预测一下
print(prediction)

⑥Gaussian Bayesian, GaussianNB()

Randomly generate a piece of data to test the model,
14,160712,3208,3681.25,36.31,1871275.09,3
the final level is 3
insert image description here

Gaussian Bayes full code

from sklearn.naive_bayes import GaussianNB
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split

fiber = pd.read_csv("./fiber.csv")
# 划分自变量和因变量
X = fiber.drop(['Grade'], axis=1)
Y = fiber['Grade']
#划分数据集
X_train, X_test, y_train, y_test = train_test_split(X, Y, random_state=0)

gnb = GaussianNB()
gnb.fit(X_train,y_train)#模型拟合
y_pred = gnb.predict(X_test)#模型预测结果
accuracy = np.mean(y_pred==y_test)#准确度
score = gnb.score(X_test,y_test)#得分
print(accuracy)
print(score)

test = np.array([[14,160712,3208,3681.25,36.31,1871275.09]])#随便找的一条数据
prediction = gnb.predict(test)#带入数据,预测一下
print(prediction)

⑦ Bernoulli Bayes, BernoulliNB()

Randomly generate a piece of data to test the model,
18,57541.5,10455,2843.36,30.68,1570013.02,2
the final level is 2
insert image description here

Bernoulli Bayes complete code

from sklearn.naive_bayes import BernoulliNB
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split

fiber = pd.read_csv("./fiber.csv")
# 划分自变量和因变量
X = fiber.drop(['Grade'], axis=1)
Y = fiber['Grade']
#划分数据集
X_train, X_test, y_train, y_test = train_test_split(X, Y, random_state=0)

bnb = BernoulliNB()
bnb.fit(X_train,y_train)#模型拟合
y_pred = bnb.predict(X_test)#模型预测结果
accuracy = np.mean(y_pred==y_test)#准确度
score = bnb.score(X_test,y_test)#得分
print(accuracy)
print(score)

test = np.array([[18,57541.5,10455,2843.36,30.68,1570013.02]])#随便找的一条数据
prediction = bnb.predict(test)#带入数据,预测一下
print(prediction)

⑧Multinomial Bayesian, MultinomialNB()

Randomly generate a piece of data to test the model,
9,64794,5560,10682.94,38.99,3748367.45,4
the final level is 4
insert image description here

Polynomial Bayes full code

from sklearn.naive_bayes import MultinomialNB
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split

fiber = pd.read_csv("./fiber.csv")
# 划分自变量和因变量
X = fiber.drop(['Grade'], axis=1)
Y = fiber['Grade']
#划分数据集
X_train, X_test, y_train, y_test = train_test_split(X, Y, random_state=0)

mnb = MultinomialNB()
mnb.fit(X_train,y_train)#模型拟合
y_pred = mnb.predict(X_test)#模型预测结果
accuracy = np.mean(y_pred==y_test)#准确度
score = mnb.score(X_test,y_test)#得分
print(accuracy)
print(score)

test = np.array([[9,64794,5560,10682.94,38.99,3748367.45]])#随便找的一条数据
prediction = mnb.predict(test)#带入数据,预测一下
print(prediction)

Finally, by adjusting the parameters and optimizing, it is determined to use the decision tree to predict the grade of this sample

5. Model saving and loading

Here we take the decision tree algorithm as an example

After the training is completed, the model is joblib.dump(dtc, './dtc.model')saved
dtcas a model instantiation object
./dtc.modelto save the model name and path

By dtc_yy = joblib.load('./dtc.model')loading the model

full code

from sklearn.tree import DecisionTreeClassifier
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
import joblib

fiber = pd.read_csv("./fiber.csv")
# 划分自变量和因变量
X = fiber.drop(['Grade'], axis=1)
Y = fiber['Grade']
#划分数据集
X_train, X_test, y_train, y_test = train_test_split(X, Y, random_state=0)

dtc = DecisionTreeClassifier()
dtc.fit(X_train,y_train)#模型拟合
joblib.dump(dtc, './dtc.model')#保存模型
y_pred = dtc.predict(X_test)#模型预测结果
accuracy = np.mean(y_pred==y_test)#准确度
score = dtc.score(X_test,y_test)#得分
print(accuracy)
print(score)


dtc_yy = joblib.load('./dtc.model')
test = np.array([[11,99498,5369,9045.27,28.47,3827588.56]])#随便找的一条数据
prediction = dtc_yy.predict(test)#带入数据,预测一下
print(prediction)

The saved model is as follows:insert image description here

Guess you like

Origin blog.csdn.net/qq_41264055/article/details/130446029