Article directory
process
1. Data import, cleaning and splitting
2. Get the data model through sklearn
3. Start prediction (here we will use regression and decision tree to make predictions)
Preliminary preparation
data
guide package
```
import pandas as pd
import matplotlib.pyplot as plt
plt.rcParams['font.sans-serif'] = ['SimHei']
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
Here we mainly use the sklearn package, as well as numpy, to facilitate data manipulation
data manipulation
Prepare sample data result data
The last column is our results, we need to separate our results from the various body data
#删除最后一列 样本数据
features=heart_df.drop(columns=["target"])
#保存标签 结果数据
target=heart_df["target"]
Split training set
The ratio of our training set is 3:1, that is, 75% of the learning data and 25% of the predicted data. There will also be special functions in sklearn to collect samples.
#切分训练集
X_train,X_test,Y_train,Y_test=train_test_split(features,target,test_size=0.25)
training data
logistic regression
def test_logistic(*data):
X_train, X_test, Y_train, Y_test=data
clf=LogisticRegression()#逻辑回归
clf.fit(X_test,Y_test)#梯度下降,递归
print("学习模型预测成绩:{:.4f}".format(clf.score(X_train,Y_train)))
print("实际模型预测成绩:{:.4f}".format(clf.score(X_test, Y_test)))
decision tree
def test_decision_tree(*data):
X_train, X_test, Y_train, Y_test = data
clf=DecisionTreeClassifier(max_depth=3,criterion="entropy")
clf.fit(X_train,Y_train)
print("学习模型决策树预测成绩:{:.4f}".format(clf.score(X_train, Y_train)))
print("实际模型决策树预测成绩:{:.4f}".format(clf.score(X_test, Y_test)))
# decision_tree_pre=clf.predict(X_test)
# print("decision_tree:",decision_tree_pre)
# print("true lbel:",Y_test)
return clf
result
This completes our prediction, let's take a look at the results
The result of the decision tree can be output
Summarize
After this operation, let's summarize it together. The beginning is to separate the value of our data from its features, then we use train_test_split to get the value, and then use the clf function directly to learn , and then go to get the score on our result