pyhton machine learning sklearn - teach you to predict heart disease


process

1. Data import, cleaning and splitting

2. Get the data model through sklearn

3. Start prediction (here we will use regression and decision tree to make predictions)

Preliminary preparation

data

Heart disease data download

guide package

```
import pandas as pd
import matplotlib.pyplot as plt
plt.rcParams['font.sans-serif'] = ['SimHei']
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier

Here we mainly use the sklearn package, as well as numpy, to facilitate data manipulation

data manipulation

Prepare sample data result data

The last column is our results, we need to separate our results from the various body data

#删除最后一列  样本数据
features=heart_df.drop(columns=["target"])
#保存标签 结果数据
target=heart_df["target"]

Split training set

The ratio of our training set is 3:1, that is, 75% of the learning data and 25% of the predicted data. There will also be special functions in sklearn to collect samples.

#切分训练集
X_train,X_test,Y_train,Y_test=train_test_split(features,target,test_size=0.25)

training data

logistic regression

def test_logistic(*data):
    X_train, X_test, Y_train, Y_test=data
    clf=LogisticRegression()#逻辑回归
    clf.fit(X_test,Y_test)#梯度下降,递归
    print("学习模型预测成绩:{:.4f}".format(clf.score(X_train,Y_train)))
    print("实际模型预测成绩:{:.4f}".format(clf.score(X_test, Y_test)))

decision tree

def test_decision_tree(*data):
    X_train, X_test, Y_train, Y_test = data
    clf=DecisionTreeClassifier(max_depth=3,criterion="entropy")
    clf.fit(X_train,Y_train)
    print("学习模型决策树预测成绩:{:.4f}".format(clf.score(X_train, Y_train)))
    print("实际模型决策树预测成绩:{:.4f}".format(clf.score(X_test, Y_test)))
    # decision_tree_pre=clf.predict(X_test)
    # print("decision_tree:",decision_tree_pre)
    # print("true lbel:",Y_test)
    return clf

result

This completes our prediction, let's take a look at the results

image.png

The result of the decision tree can be output

a4af3f1e4516c2e2506c6edd8dbec6e.jpg

Summarize

After this operation, let's summarize it together. The beginning is to separate the value of our data from its features, then we use train_test_split to get the value, and then use the clf function directly to learn , and then go to get the score on our result

Guess you like

Origin blog.csdn.net/weixin_52521533/article/details/123802186