IoT, ROS and AI {Session 3}

Content 

Table of contents

Table of contents

k-Nearest Neighbor (k-NN) Nearest Neighbor (k-NN)

k-mean K-means

Decision Tree Decision Tree

ID3 

Bayes' Theorem Bayes' Theorem

Bayes Classifier Bayes Classifier

Maximum a Posteriori (MAP) Estimation Maximum a Posteriori (MAP) Estimation

Maximum Likelihood Maximum Likelihood

Naive Bayesian Network Naive Bayesian Network

Expectation Maximisation Experience Maximization


Machine learning is divided into the following learning methods:

  • supervised learning supervised learning
    • Use a set of data to learn a model
    • Assign a label label (class) to the training data
    • example: classification and regression problems
    • Algorithms: Bayesian Networks, Artificial Neural Networks
  • Unsupervised learning unsupervised learning
    • Use a set of data to learn a model
    • No label label (class) is assigned to the training data
    • A learning algorithm must learn patterns or structures in the training data
    • example: clustering, dimensionality reduction problems
    • Algorithms: K-means, SOM
  • Semi-supervised learning semi-supercised learning
    • The training data consists of a mixture of labeled and unlabeled data
    • The model must understand the structure or patterns in the data
    • example: classification and regression problems
    • Algorithm: Hidden Markov Model

Common algorithms:

  • regression algorithm
    • Ordinary Least Squares Regression (OLSR)
    • linear regression
    • logistic regression
    • stepwise regression
    • Multivariate Adaptive Regression Splines 
  • instance-based algorithm
    • K-Nearest Algorithm (KNN)
    • Self-Organizing Map (SOM)
    • Locally Weighted Learning (LWL)
  •  regularization algorithm
    • Ridge regression
    • Least Absolute Shrinkage Selection Operator (LSSO)
    • elastic net
  •  decision tree algorithm
    • Classification and Regression Trees (CART)
    • Iterative Binary Classifier (ID3)
    • C4.5 and C5.0
    • Chi-Square Automatic Interaction Detection (CHAID)
  • Bayesian algorithm
    • Naive Bayes
    • Bayesian network
  • Clustering Algorithm
    • K-means
    • Expectation Maximization (EM)
  • Artificial Neural Network Algorithm
    • sensor
    • radial basis function network
    • hopfield network
  • deep learning algorithm
    • Deep Boltzmann Machine (DBM)
    • Deep Belief Networks (DBM)
    • Convolutional Neural Networks (CNNs)
  • Dimensionality reduction
    • Principal Component Analysis (PCA)
    • Multidimensional Scaling (MDS)
    • Partial Least Squares Regression (PLSR)
  • Ensemble algorithm
    • boosting
    • AdaBoosting
    • Booststrapped Aggregation Bagging
  • genetic algorithm

k-Nearest Neighbor (k-NN) Nearest Neighbor (k-NN)

 

All instances or objects (which we can abstract as points here) correspond to points in N-dimensional space (for n-dimensional space R^{n}). The distance between every two instances or points is defined by the standard Euclidean distance. Each instance x can be represented by a feature vector: <a_{1}(x), a_2(x), a_3(x), ..., a_n(x)>. Which a_{n}represents the value of the nth column of this instance x.

How to represent the distance between each two instances?

There are two instances a_iand a_jthe distance between them is shown in the figure above. That is, each column in the feature vector corresponding to each instance is subtracted, squared and summed.

K-NN involves training and classification algorithms. (here will not be)

 Here is an example to help us understand K-NN. First we have three camps, a, b and c. We want to divide c into the two camps of ab (also called integrating the label values ​​of k "neighbors" as the predicted value of the new sample ), but how to divide is a problem.

First of all, we need to determine the k value. Its function is how many neighbors we need to find from the c camp. How to determine the k value?

The article is transferred from - https://www.zhihu.com/question/40456656/answer/2269875330
Source: Zhihu

The k value is a hyperparameter of the KNN algorithm, and the meaning of K is the number of reference "neighbor" label values. There is a counter-intuitive phenomenon. When the value of K is small, the model complexity (capacity) is high, the training error will be reduced, and the generalization ability will be weakened; when the value of K is large, the model complexity will be low, and the training error will increase. The generalization ability has been improved to a certain extent.

The reason is that when the value of K is small (such as k==1), only training samples in a smaller field are used for prediction, and the model fitting ability is relatively strong. The decision is to follow the nearest training sample (neighbor) result. However, when the training set contains "noise samples", the model is also easily affected by these noise samples (as shown in the overfitting situation, where the noise samples are, where the decision boundary will be drawn), this will increase " Learning" variance, which is easy to overfit . At this time, the idea of ​​"listening to other neighbors" training samples can minimize the impact of these noises. When the K value is too large, the situation is reversed and it is easy to underfit.

For the selection of the K value, usually grid search can be used to select the appropriate K value by means of cross-validation.

In general: we want to choose an odd value k for a problem with two classes, and k cannot be a multiple of the number of classes.

Disadvantages of K-NN: He is always searching for the smallest Linqu around, which is a very complicated thing especially when the data set is huge.

Here are some examples of K-NN:

so, the method is very straightforward. The first step is getting the training data (ie the first table with X as the input and Y is the output) That is to say, we first need to get a training data (that is, x1 and Y in Table 1 x2, they appear in pairs, so I understand that they are two components of one entity)

Then when you do classification with a new input ie X1=3, X2=6. You just need to calculate the Euclidean distance between the new input point with all the training data point (hence, the second table). Then how do we classify ? It is only necessary to calculate the Euclidean distance between the new input point and all training data points, that is, the calculation process in the third column of Table 2, to obtain the Euclidean distance d between our new input point and the training data.

then you rank the Euclidean distances, and pick the first k samples with the least distance (for ex, the 3 training data points which is closest to the new input point). Then check the corresponding outputs of those training data points and if most of The output is Yes, then the result will be Yes. Then we need to take the first k samples with the smallest distance among these distances, and compare it with k, and it is fine if it is smaller than k.

So we can get the following result:

k-mean K-means

K-mean clustering

It is a simple unsupervised learning algorithm that classifies a given data set by a certain number of clusters.

For k clusters, we define 1 center for each, for a total of k centers

learning process:

  1. place the centroid at any pointc_j
  2. For each data point x_i, we need to find the centroid closest to him c_j, then calculate the Euclidean distance between them {\left \| x_i - c_j \right \|^{^{}}}^2, and then assign x_ithe value to c_j , that is, c_j become  x_i. Expressed asx_i^{(j)} 
  3. Repeat the above process until there c_jis no change.

Like this example. 

Decision Tree Decision Tree

Decision trees classify instances by ordering from the root node to some leaf nodes (top-down). Then these different leaf nodes may be classified into different classes, and classification is realized.

Each node in the tree is a test for a particular attribute of an instance, and each branch descending from that node is a possible value for that particular attribute.

For example:

We need to find the classification of cherries, starting from the root node first. Follow up with the characteristics of the cherries—the cherries are red, small in size, and sweet in taste. So we found the root node cherry. 

We abstract the above example, we can see that in the top node, i is an attribute of this instance. Then making a sub-node is already a category, and the right sub-node is a test for this attribute, thus dividing into two categories. So the picture above actually has three categories. They are small, medium and large respectively.

Note: If the attribute is n-dimensional, then the decision tree will create a hyperplane to separate the different classes.

one example: 

ID3 algorithm

 ID3 builds a tree in six steps

  1. Use the training data to divide the nodes
  2. Split using optimal attributes (like Outlook does above)

Bayes' Theorem Bayes' Theorem

Bayes Classifier Bayes Classifier

Maximum a Posteriori (MAP) Estimation Maximum a Posteriori (MAP) Estimation

Maximum Likelihood Maximum Likelihood

Naive Bayesian Network Naive Bayesian Network

Expectation Maximisation Experience Maximization
 

Guess you like

Origin blog.csdn.net/Harvery_/article/details/126008984