Machine Learning - + code implementation decision tree algorithm (based on R Language)

Classification tree (tree) is a very commonly used classification method. The core task is to classify the data corresponding to the possible category.

He is a regulatory study, the so-called regulatory learning is given a bunch of samples, each sample has a set of attributes and a category, these categories are determined in advance, by learning to get a classifier, the classifier is capable of emerging objects give the correct classification.

 

Decision Tree understanding

The concept of entropy is important to understand the decision tree

Decision tree to make a judgment not 100% correct, it's just doing the best judgment based on uncertainty.

Entropy is used to describe the uncertainty.

 Case: Find recommender share of bicycle users

Analysis: calculated what kind of people are more likely to be recommended by shared bicycle. In other words it is unusual between the recommender and other variables relationship.

 

step 1

Entropy measures corresponding to the node population

Two points for whether to recommend such a result, the recommended proportion of 0 or close to 1, the entropy is 0, the recommended proportion close to 50%, an entropy approach.

Analysts users need features distinguish recommender. It can reduce the entropy of node population (through a decision tree constantly bifurcation) by a decision tree as possible.

 

Step 2

Node bifurcation

Diverged way will be different gain value, the computer will select the maximum gain value, which is the best way fork.

For details, see the text message after a gain related content.

 

Step 3

Stop bifurcation in a particular case.

Note: Too many branch node will complicate the situation, but not conducive to decision-making, the need to stop the bifurcation at the appropriate time.

 

 

Information Gain (IG) concept

He expressed through a decision tree, the entire classification data entropy decrease in size.

IG obtained above entropy parent node and subtracting the weighted, the resulting entropy child node is a bifurcation after the reduction of the entropy values.

Diverged way will be different gain value, the computer will select the maximum gain value, which is the best way fork.

 

R language


> bike.data <- read.csv(Shared Bike Sample Data - ML.csv)

> library(rpart)

> library(rpart.plot)

> library(rpart.plot)
> bike.data$推荐者 <- bike.data$分数>=9
> rtree_fit <- rpart(推荐者 ~城区+年龄+组别,data=bike.data)
> rpart.plot(rtree_fit)

 

 

Guess you like

Origin www.cnblogs.com/Grayling/p/10987517.html