Statistical learning methods (Li Hang) This book is an example-based
caveats:
- I use pycharm
- python version 3.7
- graphviz is a software, in which case the pycharm have Quguan under the net
was added environment variables may also need to restart the computer after the next complete - Lack of Han Han library on the security libraries
- That is my own data set, hand knock.
Loan application sample data table
ID | age | have a job | They have their own house | Credit conditions | category |
---|---|---|---|---|---|
1 | youth | no | no | general | no |
2 | youth | no | no | it is good | no |
3 | youth | Yes | no | it is good | Yes |
4 | youth | Yes | Yes | general | Yes |
5 | youth | no | no | general | no |
6 | middle aged | no | no | general | no |
7 | middle aged | no | no | it is good | no |
8 | middle aged | Yes | Yes | it is good | Yes |
9 | middle aged | no | Yes | very good | Yes |
10 | middle aged | no | Yes | very good | Yes |
11 | elderly | no | Yes | very good | Yes |
12 | elderly | no | Yes | it is good | Yes |
13 | elderly | Yes | no | it is good | Yes |
14 | elderly | Yes | no | very good | Yes |
15 | elderly | no | no | general | no |
data set
Feature amount | Show |
---|---|
age | Youth: a middle-aged: Elderly 2: 3 |
have a job | Are: 1 No: 0 |
They have their own house | 1: No: 0 |
Credit conditions | General: 1 Good: Very good 2: 3 |
category | Are: 1 No: 0 |
dataset=[
[1,0,0,1,0],
[1,0,0,2,0],
[1,1,0,2,1],
[1,1,1,1,1],
[1,0,0,1,0],
[2,0,0,2,0],
[2,0,0,2,0],
[2,1,1,2,1],
[2,0,1,3,1],
[2,0,1,2,1],
[3,0,1,3,1],
[3,0,1,2,1],
[3,1,0,3,1],
[3,1,0,3,1],
[3,0,0,1,0]]
X = [x[0:4] for x in dataset] #取出特征值
print(X)
Y = [y[-1] for y in dataset]#取Y值
print(Y)
The method of seeking a decision tree using decision tree sklearn was determined, and then visualized using graphviz
from sklearn.tree import DecisionTreeClassifier
from sklearn.tree import export_graphviz
dataset=[
[1,0,0,1,0],
[1,0,0,2,0],
[1,1,0,2,1],
[1,1,1,1,1],
[1,0,0,1,0],
[2,0,0,2,0],
[2,0,0,2,0],
[2,1,1,2,1],
[2,0,1,3,1],
[2,0,1,2,1],
[3,0,1,3,1],
[3,0,1,2,1],
[3,1,0,3,1],
[3,1,0,3,1],
[3,0,0,1,0]
]
feature =['年龄','没有工作','没有自己的房子','信贷情况']
classname =['不借','借']
X = [x[0:4] for x in dataset]
print(X)
Y = [y[-1] for y in dataset]
print(Y)
tree_clf = DecisionTreeClassifier(max_depth=4)
tree_clf.fit(X, Y)
The above method is not required but visual tree, and on this basis with the following code
export_graphviz(
tree_clf,
out_file=("loan.dot"),
feature_names=feature,
class_names=classname,
rounded=True,
filled=True,
)
Run the code will generate loan.dot file in this directory
and then into the current directory, execute the following command in pycharm inside the local terminal
dot -Tpng loan.dot -o loan.png
It will generate png images.
My directory as follows
But you will find that there will be Chinese garbled
then you continue to add the following code
import re
# 打开 dot_data.dot,修改 fontname="支持的中文字体"
f = open("./loan.dot", "r+", encoding="utf-8")
open('./Tree_utf8.dot', 'w', encoding="utf-8").write(re.sub(r'fontname=helvetica', 'fontname="Microsoft YaHei"', f.read()))
f.close()
Then take a look at renderings
The entire code is as follows
from sklearn.tree import DecisionTreeClassifier
from sklearn.tree import export_graphviz
dataset=[
[1,0,0,1,0],
[1,0,0,2,0],
[1,1,0,2,1],
[1,1,1,1,1],
[1,0,0,1,0],
[2,0,0,2,0],
[2,0,0,2,0],
[2,1,1,2,1],
[2,0,1,3,1],
[2,0,1,2,1],
[3,0,1,3,1],
[3,0,1,2,1],
[3,1,0,3,1],
[3,1,0,3,1],
[3,0,0,1,0]
]
feature =['年龄','没有工作','没有自己的房子','信贷情况']
classname =['不借','借']
X = [x[0:4] for x in dataset]
print(X)
Y = [y[-1] for y in dataset]
print(Y)
tree_clf = DecisionTreeClassifier(max_depth=4)
tree_clf.fit(X, Y)
export_graphviz(
tree_clf,
out_file=("loan.dot"),
feature_names=feature,
class_names=classname,
rounded=True,
filled=True,
)
import re
# 打开 dot_data.dot,修改 fontname="支持的中文字体"
f = open("./loan.dot", "r+", encoding="utf-8")
open('./Tree_utf8.dot', 'w', encoding="utf-8").write(re.sub(r'fontname=helvetica', 'fontname="Microsoft YaHei"', f.read()))
f.close()
'''
dot -Tpng loan.dot -o loan.png
生成图片
'''