Python machine learning---tumor prediction (decision tree)

Tumor prediction (decision tree)

Experiment content

Based on the Wisconsin breast cancer data set, the decision tree method is used for tumor prediction.

Experimental requirements

1. Load the Wisconsin breast cancer data set that comes with sklearn and explore the data.
2. Carry out data set segmentation.
3. Configure the decision tree model.
4. Train the decision tree model.
5. Model prediction.
6. Model evaluation.
7. Parameter tuning. Based on the evaluation results, the model can be set or adjusted to better parameters to make the evaluation results more accurate.

Experimental code

#导入包
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn import tree#导入决策树
from sklearn import metrics
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split


#1.加载sklearn自带的威斯康星乳腺癌数据集,探索数据。
breast=load_breast_cancer()
print(breast.keys())
print(breast.feature_names)
data=pd.DataFrame(breast.data)
print(data.head())
target=pd.DataFrame(breast.target)
print(target.head())
data=breast['data']
target=breast['target']
feature_names=breast['feature_names']
#一维数据形式,将她们组合成dataframe,可以更直观地观察数据
df=pd.DataFrame(data,columns=(feature_names))
print("查看特征数据的前5行",df.head())
print("查看数据集的基本信息:",df.info())

#2.进行数据集分割。
train_X,test_X,train_y,test_y=train_test_split(data,target,test_size=0.2)
print(train_X.shape,train_y.shape)

#3.配置决策树模型。
model=tree.DecisionTreeClassifier()#加载决策树模型

#4.训练决策树模型。
model.fit(train_X,train_y)

#5.模型预测。
pre_y=model.predict(test_X)

#6.模型评估。
print("准确率:",metrics.accuracy_score(test_y,pre_y))

#7.参数调优。可以根据评估结果,对模型设置或调整为更优的参数,使评估结果更准确。
'''
criterion
默认为gini指数
该参数对应的三个函数对应我们上文讲过的信息增益,增益率和基尼系数,每个函数对应的评价指标有所不同,有各自的特点。
接下来我们尝试将该参数进行更换为信息增益--entropy。
'''
model2=tree.DecisionTreeClassifier(criterion = 'entropy')
model2.fit(train_X,train_y)
pre_y=model2.predict(test_X)
print("criterion参数改为信息增益(entropy)的准确率:",metrics.accuracy_score(test_y,pre_y))

'''
max_depth
默认为不限制最大深度
该参数为树的最大深度,当样本中的特征较多时,设置适当的最大深度可以防止模型过拟合。
接下来我们将尝试调整max_depth这个参数以达到模型更好的效果。
'''
model3=tree.DecisionTreeClassifier(max_depth=2)
model3.fit(train_X,train_y)
pre_y=model3.predict(test_X)
print("max_depth深度参数改为2的准确率:",metrics.accuracy_score(test_y,pre_y))

Guess you like

Origin blog.csdn.net/weixin_48434899/article/details/124146419