Classification model
- K-nearest neighbor
- Logistic regression
- Decision Tree
K Nearest Neighbor (KNN)
The simplest and most elementary classifier is all of the training data corresponding to the category are recorded, and when a training Attribute object test object exact match, they can be classified
K-nearest neighbor (k-nearest neighbour, KNN) is a basic classification, classification by measuring the distance between the different feature values. It is the idea: if a sample in feature space of the k most similar (i.e. nearest feature space) in a sample belonging to a certain category of most, the sample may also fall into this category, where K is typically not more than integer 20;
KNN algorithm, the selected neighbors are already correctly classified objects
Green circle (test data) to determine which classes are given, red triangle or blue square?
If K = 3 (green circle nearest distance 3), since the proportion of red triangle is 2/3, will be given the green circle red triangles that class,
if K = 5 (green circle from the nearest 5), since the blue square ratio of 3/5, and therefore green blue square circle is given class;
KNN algorithm results to a large extent depends on the choice of K.
Distance calculating KNN
KNN by calculating the distance between objects as the dissimilarity index between the respective objects, to avoid the problem of matching between objects, the distance is generally used herein Euclidean distance or the Manhattan distance:
(Usually selected Manhattan distance)
KNN algorithm
in the case of the training set data and the tag is known, the input test data, the test data of the characteristics corresponding to the training set characteristics compared with each other, to find the most similar to the training set with the previous K data, the test data corresponding category is the highest number that appears in the classification K data, the algorithm is described as follows:
- Calculating the distance between each of the training data and test data;
- Sorted in order of increasing distance relationship;
- Select the smallest distance K points;
- Determining the frequency of occurrence of the first K categories where points;
- The highest frequency category before returning to K points appear as predictive classification test data.
Code implementation:
# ## 0.05 introduced dependent Import numpy AS NP Import PANDAS PD AS # introduced directly sklearn in this data set, iris Iris from sklearn.datasets Import load_iris from sklearn.model_selection Import train_test_split # partitioned data sets for training and testing set from sklearn.metrics Import accuracy_score # calculate classification accuracy of prediction
Based sklearn the iris data set and Introduction
Attachment:
Print (iris.data.shape) # Data corresponding to the four characteristics of the sample, 150 rows and 4 columns (150,. 4) Print (iris.data [: 5]) # First 5 rows of sample characteristics Print (iris.target .shape) # target category corresponding to a sample (target attribute), a row 150 Print (iris.target) # display target properties of all the samples; for each sample iris contains a variety of information, i.e., the target properties (5 column, also called target or a label) ===> ( 150,. 4 ) [[ 5.1 3.5 of 1.4 0.2 ] [ 4.9 1.4 3. 0.2 ] [ 4.7 3.2 1.3 0.2 ] [ 4.6 0.2 3.1 for 1.5 ] [ 5. The 3.6 1.4 0.2 ] ] ( 150 ,) [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2]
iris = load_iris() df = pd.DataFrame(data = iris.data, columns = iris.feature_names) df['class'] = iris.target df['class'] = df['class'].map({0: iris.target_names[0], 1: iris.target_names[1], 2: iris.target_names[2]}) df.head(10)
df.describe()
x = iris.data y = iris.target.reshape(-1,1) print(x.shape, y.shape) ===> (150, 4) (150, 1)
# Dividing the training set and test set x_train, x_test, y_train, android.permission.FACTOR. Train_test_split = (X, Y, test_size = 0.3, random_state = 35, Stratify = Y) Print (x_train.shape, y_train.shape) Print (x_test.shape, android.permission.FACTOR. .shape) ==> ( 105,. 4) (105,. 1 ) ( 45,. 4) (45,. 1)
The core algorithm
# Distance function defined DEF l1_distance (A, B): return np.sum (np.abs (ab &), Axis =. 1) # Manhattan distance, the absolute value of the vector, the first order from the DEF l2_distance (A, B): return NP .sqrt (np.sum ((ab &) ** 2, Axis =. 1)) # Second Order from # classifier implemented class of kNN (Object): # define an initialization method, __ init__ is a class constructor DEF the __init__ (Self, =. 1 N_NEIGHBORS, dist_func = l1_distance): self.n_neighbors = N_NEIGHBORS self.dist_func = dist_func # training model method DEF Fit (Self, X, Y): self.x_train = X self.y_train = Y # model predictive method DEF Predict (Self, X): # initialization classified prediction array y_pred np.zeros = ((x.shape [0],. 1), DTYPE = self.y_train.dtype ) # traversing x input data points taken for each data point i and the data sequence number x_test for i, x_test in the enumerate (x): # x_test with all of the training data to calculate the distance distances = self.dist_func (self.x_train, x_test) # obtained in accordance with the distance from near to far ordering, the index value extracted nn_index = np.argsort (distances) # select latest k points, their corresponding categories of classification stored nn_y =self.y_train [nn_index [: self.n_neighbors]] .ravel () # Highest frequency that, assigned to y_pred [i] statistical category appears y_pred [i] = np.argmax (np.bincount (nn_y)) return y_pred
### 3. Test
# Define a knn example knn = kNN (N_NEIGHBORS = 3 ) # training model knn.fit (x_train, y_train) # incoming test data, make predictions y_pred = knn.predict (x_test) Print ( " classification accuracy: {:. }% 5F " .format (accuracy_score (android.permission.FACTOR., y_pred) * 100 )) ==> classification accuracy: 93.33333%
L1 and L2 distance test whether there is a relatively large impact on the accuracy?
# Define a knn example knn = of kNN () # training model knn.fit (x_train, y_train) # save results List result_list = [] # selected for different parameters, the prediction made for P in [. 1, 2 ]: knn.dist_func l1_distance = IF P ==. 1 the else l2_distance # considering different values of k in steps of 2 for k in Range (. 1, 10, 2 ): knn.n_neighbors = k # incoming test data, make predictions y_pred = KNN. predict (x_test) # obtained forecast accuracy = Accuracy accuracy_score (android.permission.FACTOR., y_pred) result_list.append ([K, ' l1_distance ' IF P ==. 1 the else ' l2_distance ' , Accuracy]) DF = pd.DataFrame (result_list, Columns = [ ' K ' , ' distance function ' , ' forecast accuracy ' ]) # select a distance of 5 l2_distance df
It can be drawn:
5 | l2_distance | 0.977778 |
KNN = 5, the distance selected l2_distance, its accuracy is the highest.
Decision Tree
A decision tree is a simple and efficient and has a strong explanatory model, widely used in data analysis. Its essence is determined by a plurality of tree nodes consisting of a top-down;
Decision Tree Example:
Xiao Ming predict today whether it will go out to play
Tree and if-then rules
-
A decision tree can be seen as a set of if-then rules
- Decision tree root node to the leaf node of each path, a construct rule: wherein internal nodes on the path correspond to the rule's conditions (for condition Condition), the leaf nodes corresponding to conclusion of the rule
- if-then rules decision tree collection has an important property: exclusive and complete. That is, each instance is covered by a rule (a path), and this rule is only covered
What decision tree Condition yes?
- Condition determination process is the feature selection process;
Tree goal:
- The nature of the decision tree learning from centralized summed up a set of if-then rules classify the training data
- And the training set does not contradict the decision tree, there may be many months, or may not one; so we need to select a smaller tree contradiction with the training data set
- Another point of view, we can put a decision tree as a conditional probability model , our goal is to assign to a greater example of the kind of conditional probability to go;
- Decision tree to choose the best from all possible cases, the problem is NP-complete, so we usually use a heuristic algorithm to solve the decision tree to get a sub-optimal solution
- The algorithm is typically performed recursively following procedures: selecting an optimal characteristics, and the training data is divided in accordance with this feature, each sub data set has such a best classification
Feature selection:
- Feature selection is decided to divide the feature space which feature
Random variables:
the nature of random variables (random variable) is a function that is set from the sub-sample space mapping real numbers, converting the event into a value
The different elements in the sample space (i.e., different results), the random variable is also generated randomly. We can say that the random variable is the "value" of the results
in real life, the experimental results are descriptive terms, such as "obverse" and "negative." In the mathematician's eyes, these words of the narrative is too cumbersome, so take the numbers to represent them
Entropy
entropy (entropy) is a measure of the uncertainty of random variables; the greater the uncertainty of variables, the greater the entropy
Let X be a discrete random variable that takes a finite value, the probability distribution:
The entropy of the random variable X is defined as:
Typically, the above formula logarithm base 2 or base e (natural logarithm), then the units are referred to as entropy bits (bit), or Nath (nat).
When the distribution of the random variable takes only two values, e.g., 0, then X is:
Entropy:
In this case, the entropy H (p) with probability p curve changes as shown below (in bits):
Example entropy
To three balls Category
- Obviously one can see a set of the red ball alone, a black ball set;
- So from the standpoint of entropy point of view, what happens then?
Entropy initial state: E (three balls) = - 1/3 * log (1/3) - 2/3 * log (2/3) = 0.918
① The first classification is a red ball, a set of a black ball, another set of black ball yourself:
- Red ball and black ball in a group of red and black, red and black balls each occurrence probability is 1/2.
- 100% black ball appeared in another group, the probability of a red ball 0
E (red and black | black) = E (red and black) + E (black) = - 1/2 * log (1/2) - 1/2 * log (1/2) - 1 * log (1) = 1 ; you can see, instead of increasing the entropy after the classification
② The second method is to divide a set of red ball himself, and the remaining two black balls group:
Probability red ball black ball in the group is 0, the probability of a red ball in the black ball is set to 0, this classification has been "pure", that is, after the classification subset of random variables has become a certainty a ;
E (red | black) = E (red) + E (black) = - 1 * log (1) - 1 * log (1) = 0
Tree goal
- We use a decision tree model ultimate goal is to use decision tree classification model prediction, a prediction which category we give a final set of data belongs to, which is a determination of uncertainty to the process;
- The final classification is ideal, each set of data, the uncertainty can be found in the corresponding category in accordance with the decision tree branch
- So we choose the data information entropy features the fastest decline as a classification node, making the decision tree tends to be determined as soon as possible
Conditional entropy (conditional entropy)
The conditional entropy H (Y | X) represents the random variable X under conditions known random variable Y uncertainty:
among them
- The entropy H (D) represents the data set D in the uncertainty classification.
- Conditional entropy H (D | A) refers to the uncertainty of the data set in classification conditions of a given feature A
- When the entropy and the conditional entropy of the probability estimate from the data obtained, the corresponding entropy and the conditional entropy, respectively called entropy experience (empirical entropy) and experience conditional entropy (empirical conditional entropy)
Information gain
A conditional entropy H wherein the training data set D information gain G (D, A), D is defined as a set of empirical entropy H (D) wherein A and D under the given conditions of the (D | A) difference, i.e.,
- Decision tree learning application information gain feature selection criteria
- Experience entropy H (D) represents the data set D the uncertainty classification. Experience the conditional entropy H (D | A) represents the uncertainty in the classification of the data set D wherein A at the given conditions. Then their difference, i.e., information gain, since it represents wherein A degree of uncertainty so that the classification of the data set D is reduced
- For the data set D, the information gain depends on the characteristics, features often have different gain different information
- Information gain big feature has stronger classification ability
Decision Tree algorithm
Decision tree (ID3) of the training process is to find the greatest information gain characteristics, and then follow the feature classification, and then find the largest concentration of each type of sub-feature information gain, and then follow the feature classification, finally get to meet the requirements of the model;
ID3 C4.5 algorithm based on the improvement made by the information gain ratio selected characteristics;
classification and regression trees (CART): a feature selection, and the spanning tree pruning three parts, may be also be used for classification in return;