数学建模--分类模型

逻辑回归

import pandas as pd
import numpy as np
data = pd.read_excel('fruit_data.xlsx', index_col="ID")
data.head()
mass width height color_score fruit_name
ID
1 192 8.4 7.3 0.55 apple
2 180 8.0 6.8 0.59 apple
3 176 7.4 7.2 0.60 apple
4 178 7.1 7.8 0.92 apple
5 172 7.4 7.0 0.89 apple
train_data = data.dropna()
train_data['category'] = train_data['fruit_name'].apply(lambda x: 1 if x=='apple' else 0)
train_data.head()
R:\Anaconda\lib\site-packages\ipykernel_launcher.py:2: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
mass width height color_score fruit_name category
ID
1 192 8.4 7.3 0.55 apple 1
2 180 8.0 6.8 0.59 apple 1
3 176 7.4 7.2 0.60 apple 1
4 178 7.1 7.8 0.92 apple 1
5 172 7.4 7.0 0.89 apple 1
test_data = data.loc[data['fruit_name'].isnull() == True]
test_data
mass width height color_score fruit_name
ID
39 158 7.1 7.6 0.72 NaN
40 190 7.5 7.9 0.77 NaN
41 189 7.6 7.7 0.77 NaN
42 160 7.9 6.9 0.65 NaN

方法一:sklearn.linear_model.LogisticRegression

from sklearn.linear_model import LogisticRegression
X = train_data.iloc[:,:-2]
y = train_data['category']
LR = LogisticRegression()
LR.fit(X, y)
LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=100,
                   multi_class='auto', n_jobs=None, penalty='l2',
                   random_state=None, solver='lbfgs', tol=0.0001, verbose=0,
                   warm_start=False)
print(LR.intercept_)
[4.54213181]
print(LR.coef_)
[[-0.01125145  0.97166531 -1.314372    0.20036824]]
test = test_data.iloc[:,:-1]
# 预测
print(LR.predict(test))
print(LR.predict_proba(test))  # 预测为0和1的概率
[0 0 0 1]
[[0.54530945 0.45469055]
 [0.63120971 0.36879029]
 [0.54143416 0.45856584]
 [0.18555923 0.81444077]]
# 准确率
LR.score(X, y)
0.7105263157894737

方法二:statsmodels(结果与SPSS一致)

import statsmodels.api as sm
X1 = sm.add_constant(X)
lr = sm.Logit(y, X1)
result = lr.fit()
result.summary()
Optimization terminated successfully.
         Current function value: 0.449106
         Iterations 7
Logit Regression Results
Dep. Variable: category No. Observations: 38
Model: Logit Df Residuals: 33
Method: MLE Df Model: 4
Date: Tue, 12 May 2020 Pseudo R-squ.: 0.3521
Time: 12:30:14 Log-Likelihood: -17.066
converged: True LL-Null: -26.340
Covariance Type: nonrobust LLR p-value: 0.0009644
coef std err z P>|z| [0.025 0.975]
const -7.2016 14.503 -0.497 0.620 -35.627 21.224
mass -0.0238 0.024 -0.982 0.326 -0.071 0.024
width 4.3068 1.844 2.335 0.020 0.692 7.922
height -3.7497 1.641 -2.286 0.022 -6.965 -0.534
color_score 9.8913 5.746 1.722 0.085 -1.370 21.152
result.predict(sm.add_constant(test))
ID
39    0.147665
40    0.194533
41    0.446099
42    0.972809
dtype: float64

线性判别分析

from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
lda = LinearDiscriminantAnalysis()
X_r = lda.fit(X, y)
X_r.coef_
array([[-0.03206332,  4.57480239, -2.87678633, 10.50469726]])
X_r.score(X, y)
0.7631578947368421
X_r.predict(test)
array([0, 0, 0, 1], dtype=int64)
X_r.predict(X)
array([1, 1, 0, 0, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0,
       0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1], dtype=int64)
# X_r.predict_proba(X)

多分类

data2 = pd.read_excel('mul_fruit.xlsx')
data2.head()
ID mass width height color_score fruit_name kind
0 1 192 8.4 7.3 0.55 apple 1.0
1 2 180 8.0 6.8 0.59 apple 1.0
2 3 176 7.4 7.2 0.60 apple 1.0
3 4 178 7.1 7.8 0.92 apple 1.0
4 5 172 7.4 7.0 0.89 apple 1.0
train_data2 = data2.dropna()
test2 = data2.loc[data2['fruit_name'].isnull() == True].iloc[:,1:5]
target_names = train_data2['fruit_name'].unique()
X = train_data2.iloc[:,[1,2,3,4]]
y = train_data2['kind']
lda2 = LinearDiscriminantAnalysis(n_components=2)
X_r2 = lda.fit(X, y)
X_r2.score(X, y)
0.8305084745762712
X_r2.predict(test2)
array([3., 3., 3., 1., 2., 4., 1., 3.])
import matplotlib.pyplot as plt
X_rr = X_r2 = lda.fit(X, y).transform(X)
plt.figure()
colors = ['navy', 'turquoise', 'darkorange', 'blue']
lw = 2

for color, i, target_name in zip(colors, [1, 2, 3, 4], target_names):
    plt.scatter(X_rr[y == i, 0], X_rr[y == i, 1], color=color, alpha=.8, lw=lw,
                label=target_name)
    
plt.legend(loc='best', shadow=False, scatterpoints=1)
plt.title('LDA of FRUITS dataset')
Text(0.5, 1.0, 'LDA of FRUITS dataset')

在这里插入图片描述

np.set_printoptions(suppress=True)  # 不使用用科学计数法
# X_r2.predict_proba(X)

猜你喜欢

转载自blog.csdn.net/ddjhpxs/article/details/106074489