Python implementation decision tree 1 (series) - start with the simplest algorithm

Starting from the Titanic to the tree, a thousand miles begins coding.

1 mind

A mind: have been benefited from various quarters to share the power of God online, the total would like a little feedback.
Mind II: long wanted to write about a good decision tree (once with SAS wrote ID3, but always felt that it is not a good completion status).

2 content of the article

This series of articles is divided into about four parts:
Part 1: Introduction An example of the Kaggle, it is based on the Titanic survivors / mortality data do the decision tree model. Personally I feel this is good example, agree with some ideas which says; of course, the biggest benefit is to provide sample data, and made some results, and the results can be confirmed wrote.

Part II: The basic concept of the decision tree, and I realize the process of decision tree. More interesting is, I always prefer to functional programming, but in the process of writing the decision tree, suddenly found the object programming wording is quite easy.

Part III: The process by about 20 to achieve several functions used in the process are described one by one.

Part IV: How to use the self-built application of the decision tree, the comparison result. (After the decision tree will continue to do the depth of the tree)

3 starting from the Titanic (Kaggle)

Introduction to Decision Trees (Titanic dataset)
Decision Tree Data Modeling Titanic
first gave Kaggle (referred to as K) presented the results of the decision-making data, as a whole, said the results of the decision tree model is relatively easy to understand, and which specific content next time I elaborate on it.

Titanic accident background can have a better understanding through film, in short, the task is to predict survival following model passenger (survived or not). Needs painting can take a look at the movie.

变量解释
sibsp: The dataset defines family relations in this way…
Sibling = brother, sister, stepbrother, stepsister
Spouse = husband, wife (mistresses and fiancés were ignored)
parch: The dataset defines family relations in this way…
Parent = mother, father
Child = daughter, son, stepdaughter, stepson
Some children travelled only with a nanny, therefore parch=0 for them.

Released three original articles · won praise 0 · Views 44

Guess you like

Origin blog.csdn.net/yukai08008/article/details/104637469