Machine learning models | KNN | tree

 

Classification model

  • K-nearest neighbor
  • Logistic regression
  • Decision Tree

K Nearest Neighbor (KNN)

The simplest and most elementary classifier is all of the training data corresponding to the category are recorded, and when a training Attribute object test object exact match, they can be classified
K-nearest neighbor (k-nearest neighbour, KNN) is a basic classification, classification by measuring the distance between the different feature values. It is the idea: if a sample in feature space of the k most similar (i.e. nearest feature space) in a sample belonging to a certain category of most, the sample may also fall into this category, where K is typically not more than integer 20;
KNN algorithm, the selected neighbors are already correctly classified objects

                       

Green circle (test data) to determine which classes are given, red triangle or blue square?
  If K = 3 (green circle nearest distance 3), since the proportion of red triangle is 2/3, will be given the green circle red triangles that class,
  if K = 5 (green circle from the nearest 5), since the blue square ratio of 3/5, and therefore green blue square circle is given class;

KNN algorithm results to a large extent depends on the choice of K.

Distance calculating KNN
KNN by calculating the distance between objects as the dissimilarity index between the respective objects, to avoid the problem of matching between objects, the distance is generally used herein Euclidean distance or the Manhattan distance:

        

 

 

    (Usually selected Manhattan distance)

KNN algorithm
in the case of the training set data and the tag is known, the input test data, the test data of the characteristics corresponding to the training set characteristics compared with each other, to find the most similar to the training set with the previous K data, the test data corresponding category is the highest number that appears in the classification K data, the algorithm is described as follows:

  • Calculating the distance between each of the training data and test data;
  • Sorted in order of increasing distance relationship;
  • Select the smallest distance K points;
  • Determining the frequency of occurrence of the first K categories where points;
  • The highest frequency category before returning to K points appear as predictive classification test data.

Code implementation:

# ## 0.05 introduced dependent 
Import numpy AS NP
 Import PANDAS PD AS 

# introduced directly sklearn in this data set, iris Iris 
from sklearn.datasets Import load_iris 
 from sklearn.model_selection Import train_test_split   # partitioned data sets for training and testing set 
from sklearn.metrics Import accuracy_score # calculate classification accuracy of prediction

   Based sklearn the iris data set and Introduction

 Attachment:

Print (iris.data.shape) # Data corresponding to the four characteristics of the sample, 150 rows and 4 columns (150,. 4) 
Print (iris.data [: 5]) # First 5 rows of sample characteristics 
Print (iris.target .shape) # target category corresponding to a sample (target attribute), a row 150 
Print (iris.target) # display target properties of all the samples; for each sample iris contains a variety of information, i.e., the target properties (5 column, also called target or a label) 
===> 
( 150,. 4 ) 
[[ 5.1 3.5 of 1.4 0.2 ] 
 [ 4.9 1.4 3. 0.2 ] 
 [ 4.7 3.2 1.3 0.2 ] 
 [ 4.6 0.2 3.1 for 1.5 ] 
 [ 5. The 3.6 1.4 0.2 ] ] 
( 150 ,)
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2
 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 2 2]
View Code
iris = load_iris()
df = pd.DataFrame(data = iris.data, columns = iris.feature_names)
df['class'] = iris.target
df['class'] = df['class'].map({0: iris.target_names[0], 1: iris.target_names[1], 2: iris.target_names[2]})
df.head(10)

         

df.describe()

     

x = iris.data
y = iris.target.reshape(-1,1)
print(x.shape, y.shape)
===>
(150, 4) (150, 1)
# Dividing the training set and test set 
x_train, x_test, y_train, android.permission.FACTOR. Train_test_split = (X, Y, test_size = 0.3, random_state = 35, Stratify = Y) 

Print (x_train.shape, y_train.shape)
 Print (x_test.shape, android.permission.FACTOR. .shape)
 ==> 
( 105,. 4) (105,. 1 ) 
( 45,. 4) (45,. 1)

The core algorithm

# Distance function defined 
DEF l1_distance (A, B):
     return np.sum (np.abs (ab &), Axis =. 1) # Manhattan distance, the absolute value of the vector, the first order from the 
DEF l2_distance (A, B):
     return NP .sqrt (np.sum ((ab &) ** 2, Axis =. 1)) # Second Order from 

# classifier implemented 
class of kNN (Object):
     # define an initialization method, __ init__ is a class constructor 
    DEF  the __init__ (Self, =. 1 N_NEIGHBORS, dist_func = l1_distance): 
        self.n_neighbors = N_NEIGHBORS 
        self.dist_func = dist_func 
    
    # training model method 
    DEF Fit (Self, X, Y): 
        self.x_train = X 
        self.y_train = Y 
    
    # model predictive method 
    DEF Predict (Self, X):
         # initialization classified prediction array 
        y_pred np.zeros = ((x.shape [0],. 1), DTYPE = self.y_train.dtype ) 
        
        # traversing x input data points taken for each data point i and the data sequence number x_test 
        for i, x_test in the enumerate (x):
             # x_test with all of the training data to calculate the distance 
            distances = self.dist_func (self.x_train, x_test) 
            
            # obtained in accordance with the distance from near to far ordering, the index value extracted 
            nn_index = np.argsort (distances) 
            
            # select latest k points, their corresponding categories of classification stored 
            nn_y =self.y_train [nn_index [: self.n_neighbors]] .ravel ()
            
            # Highest frequency that, assigned to y_pred [i] statistical category appears 
            y_pred [i] = np.argmax (np.bincount (nn_y)) 
        
        return y_pred

### 3. Test

# Define a knn example 
knn = kNN (N_NEIGHBORS = 3 )
 # training model 
knn.fit (x_train, y_train)
 # incoming test data, make predictions 
y_pred = knn.predict (x_test)
 Print ( " classification accuracy: {:. }% 5F " .format (accuracy_score (android.permission.FACTOR., y_pred) * 100 ))
 ==> 
classification accuracy: 93.33333%

L1 and L2 distance test whether there is a relatively large impact on the accuracy?

# Define a knn example 
knn = of kNN ()
 # training model 
knn.fit (x_train, y_train) 

# save results List 
result_list = [] 

# selected for different parameters, the prediction made 
for P in [. 1, 2 ]: 
    knn.dist_func l1_distance = IF P ==. 1 the else l2_distance 
    
    # considering different values of k in steps of 2 
    for k in Range (. 1, 10, 2 ): 
        knn.n_neighbors = k
         # incoming test data, make predictions 
        y_pred = KNN. predict (x_test)
         # obtained forecast accuracy
        = Accuracy accuracy_score (android.permission.FACTOR., y_pred) 
        result_list.append ([K, ' l1_distance '  IF P ==. 1 the else  ' l2_distance ' , Accuracy]) 
DF = pd.DataFrame (result_list, Columns = [ ' K ' , ' distance function ' , ' forecast accuracy ' ])   # select a distance of 5 l2_distance 
df

                

 

 It can be drawn:

5 l2_distance 0.977778

KNN = 5, the distance selected l2_distance, its accuracy is the highest.

Decision Tree

A decision tree is a simple and efficient and has a strong explanatory model, widely used in data analysis. Its essence is determined by a plurality of tree nodes consisting of a top-down;

     

 

 

Decision Tree Example: 

Xiao Ming predict today whether it will go out to play

     

 

 

    

 

 

Tree and if-then rules

  • A decision tree can be seen as a set of if-then rules

  • Decision tree root node to the leaf node of each path, a construct rule: wherein internal nodes on the path correspond to the rule's conditions (for condition Condition), the leaf nodes corresponding to conclusion of the rule
  • if-then rules decision tree collection has an important property: exclusive and complete. That is, each instance is covered by a rule (a path), and this rule is only covered

What decision tree Condition yes?

  • Condition determination process is the feature selection process;

Tree goal:

  • The nature of the decision tree learning from centralized summed up a set of if-then rules classify the training data
  • And the training set does not contradict the decision tree, there may be many months, or may not one; so we need to select a smaller tree contradiction with the training data set
  • Another point of view, we can put a decision tree as a conditional probability model , our goal is to assign to a greater example of the kind of conditional probability to go;
  • Decision tree to choose the best from all possible cases, the problem is NP-complete, so we usually use a heuristic algorithm to solve the decision tree to get a sub-optimal solution
  • The algorithm is typically performed recursively following procedures: selecting an optimal characteristics, and the training data is divided in accordance with this feature, each sub data set has such a best classification

Feature selection:

  • Feature selection is decided to divide the feature space which feature

        

 

 

Random variables:
  the nature of random variables (random variable) is a function that is set from the sub-sample space mapping real numbers, converting the event into a value

                

The different elements in the sample space (i.e., different results), the random variable is also generated randomly. We can say that the random variable is the "value" of the results
in real life, the experimental results are descriptive terms, such as "obverse" and "negative." In the mathematician's eyes, these words of the narrative is too cumbersome, so take the numbers to represent them

Entropy
  entropy (entropy) is a measure of the uncertainty of random variables; the greater the uncertainty of variables, the greater the entropy

Let X be a discrete random variable that takes a finite value, the probability distribution:

          

The entropy of the random variable X is defined as:

      

Typically, the above formula logarithm base 2 or base e (natural logarithm), then the units are referred to as entropy bits (bit), or Nath (nat).

When the distribution of the random variable takes only two values, e.g., 0, then X is:

    

Entropy:   

  In this case, the entropy H (p) with probability p curve changes as shown below (in bits):

                                           

 

 

 

Example entropy

To three balls Category

  

  • Obviously one can see a set of the red ball alone, a black ball set;
  • So from the standpoint of entropy point of view, what happens then?

Entropy initial state: E (three balls) = - 1/3 * log (1/3) - 2/3 * log (2/3) = 0.918

① The first classification is a red ball, a set of a black ball, another set of black ball yourself:

  • Red ball and black ball in a group of red and black, red and black balls each occurrence probability is 1/2. 
  • 100% black ball appeared in another group, the probability of a red ball 0

           

E (red and black | black) = E (red and black) + E (black) = - 1/2 * log (1/2) - 1/2 * log (1/2) - 1 * log (1) = 1 ; you can see, instead of increasing the entropy after the classification

② The second method is to divide a set of red ball himself, and the remaining two black balls group:

Probability red ball black ball in the group is 0, the probability of a red ball in the black ball is set to 0, this classification has been "pure", that is, after the classification subset of random variables has become a certainty a ;

            

  E (red | black) = E (red) + E (black) = - 1 * log (1) - 1 * log (1) = 0

Tree goal

  • We use a decision tree model ultimate goal is to use decision tree classification model prediction, a prediction which category we give a final set of data belongs to, which is a determination of uncertainty to the process;
  • The final classification is ideal, each set of data, the uncertainty can be found in the corresponding category in accordance with the decision tree branch
  • So we choose the data information entropy features the fastest decline as a classification node, making the decision tree tends to be determined as soon as possible

Conditional entropy (conditional entropy)

The conditional entropy H (Y | X) represents the random variable X under conditions known random variable Y uncertainty:

             among them

  • The entropy H (D) represents the data set D in the uncertainty classification.
  • Conditional entropy H (D | A) refers to the uncertainty of the data set in classification conditions of a given feature A
  • When the entropy and the conditional entropy of the probability estimate from the data obtained, the corresponding entropy and the conditional entropy, respectively called entropy experience (empirical entropy) and experience conditional entropy (empirical conditional entropy)

Information gain

A conditional entropy H wherein the training data set D information gain G (D, A), D is defined as a set of empirical entropy H (D) wherein A and D under the given conditions of the (D | A) difference, i.e.,

       

  • Decision tree learning application information gain feature selection criteria
  • Experience entropy H (D) represents the data set D the uncertainty classification. Experience the conditional entropy H (D | A) represents the uncertainty in the classification of the data set D wherein A at the given conditions. Then their difference, i.e., information gain, since it represents wherein A degree of uncertainty so that the classification of the data set D is reduced
  • For the data set D, the information gain depends on the characteristics, features often have different gain different information
  • Information gain big feature has stronger classification ability

Decision Tree algorithm

Decision tree (ID3) of the training process is to find the greatest information gain characteristics, and then follow the feature classification, and then find the largest concentration of each type of sub-feature information gain, and then follow the feature classification, finally get to meet the requirements of the model;
ID3 C4.5 algorithm based on the improvement made by the information gain ratio selected characteristics;
classification and regression trees (CART): a feature selection, and the spanning tree pruning three parts, may be also be used for classification in return;

 

Guess you like

Origin www.cnblogs.com/shengyang17/p/11441929.html