anomaly detection

anomaly detection

 
About anomaly detection (anomaly detection) This article mainly introduces several aspects:
  • Anomaly Detection Definition and Application Areas
  • Common Anomaly Detection Algorithms
  • Gaussian distribution (normal distribution)
  • Anomaly Detection Algorithms
  • Evaluating anomaly detection algorithms
  • Anomaly Detection vs Supervised Learning
  • How to design select features
  • Multivariate Gaussian Distribution
  • Application of Multivariate Gaussian Distribution in Anomaly Detection
1. Definition and application of anomaly detection
First, what is anomaly detection? The so-called anomaly detection is to find objects that are different from most objects, in fact, to find outliers. Anomaly detection is sometimes also called bias detection. Exception objects are relatively rare. Here are some common applications of anomaly detection:
  • Fraud detection: mainly by detecting abnormal behavior to detect whether it is stealing other people's credit cards.
  • Intrusion Detection: Detecting intrusions into computer systems
  • Medical field: detecting whether people's health is abnormal
2. Common anomaly detection algorithms
There are many anomaly detection algorithms, but this blog will only detail model-based techniques. There are mainly the following anomaly detection methods:
  • Model-based techniques: Many anomaly detection techniques first build a model of the data, and anomalies are those objects that do not fit perfectly with the model. For example, a model of a data distribution can be created by estimating the parameters of a probability distribution. An object is considered an anomaly if it does not obey the distribution.
  • Proximity-based techniques: Proximity measures can often be defined between objects, with anomalous objects being those that are far from most other objects. Distance-based outliers can be detected visually when the data can be presented in a 2D or 3D scatter plot.
  • Density-based techniques: Density estimates for objects can be relatively straightforward to compute, especially when there is a measure of proximity between objects. Objects in low-density regions are relatively far from their neighbors and may be seen as anomalies.
3. Gaussian distribution (normal distribution)
Gaussian distribution is also called normal distribution. If a random variable X obeys Gaussian distribution, it is recorded as: . where  is the mathematical expectation, is the variance, and its probability density function is:
 
Its general image is shown in the following figure:
Its image looks like a bell, so it is also called a bell curve. The expectation of a normal distribution   determines its center position, and its standard deviation   determines its width. If  , it is called a standard normal distribution, and its image is shown in the following figure:
 
Let's take a look at a few Gaussian distribution images, pay attention to the experience ,  the impact on the image:
 
 
 
Parameter estimation: If a data set is given and the samples in the known data set follow a normal distribution, that is  , how to find the parameter   sum   ? This is the parameter estimation. We have the following formula to estimate  , :
 
Specifically, the features formula is:
 
 
4. Anomaly Detection Algorithms
For a given training set: , for each sample x , that is, each sample is an n-dimensional vector, then a probability model can be established to estimate the probability density of each sample:
 
 
Combined with the parameter estimation mentioned in the third, we can get an anomaly detection algorithm:
 
 
 
Writing this, everyone should have a question: how is this   value determined? In fact, this   is an empirical value. The method given by NG is to select the one that maximizes the value of the evaluation index (such as F-measure) on the validation set   .
The above is the algorithm of anomaly detection, which can detect outliers very well. Next, let's look at an example, how to use the above anomaly detection algorithm to identify outliers: (example from ng machine learning class)
 
 
 
In the above figure, for the given two test samples , , their probability densities can be calculated respectively, which are 0.0426, 0.0021, respectively, and then   compared with the given values, respectively, can be drawn as outliers.
 
V. Evaluation of Anomaly Detection Algorithms
As mentioned earlier, how to divide the data set into the training set, the cross-validation machine and the test set is also required in anomaly detection. An example of division is given below:
After the data set is divided, we still need some indicators to evaluate our algorithm, so precision, recall and F-Measure can come in handy again. About these evaluation indicators, please refer to the previous blog on the evaluation method of imbalanced algorithm http: //blog.csdn.net/u012328159/article/details/51282428 has been discussed in detail. If you are not clear, you can refer to this blog.
 
6. Anomaly Detection vs Supervised Learning
The anomaly detection algorithm has basically been introduced. You may think about the difference between the anomaly detection algorithm and the supervised learning algorithm (linear regression, logistic regression, neural network, etc.) mentioned above. It seems that it can be regarded as a classification problem. label to make a judgment. . Let's compare the differences between anomaly detection algorithms and supervised learning algorithms and their respective characteristics, and then compare their respective adaptation scenarios to give you an intuitive understanding.
Let’s first look at the comparison of anomaly detection algorithms and supervised learning algorithms:
 
To sum up: in anomaly detection, outliers are few and far between, so it is difficult for supervised learning algorithms to learn anything from these anomalous samples. . .
Let's take a look at the respective application scenarios of anomaly detection algorithms and supervised learning algorithms:
7. How to design and choose features
In fact, one of the factors that affects the efficiency of a learning algorithm is the feature. What kind of feature you choose to use as the input of the machine learning algorithm will definitely have different efficiency. Therefore, the following describes how to design and select feature variables in anomaly detection algorithms.
The anomaly detection algorithm is actually modeling data that conforms to the Gaussian distribution. If the sample does not conform to the Gaussian distribution, the algorithm can run well, but if the sample is converted to a Gaussian distribution, the algorithm will work better. Sometimes we will encounter the situation that the sample does not conform to the Gaussian distribution, and the deviation is a bit far, so usually we need to transform the data to make it conform to the Gaussian distribution, generally by logarithmic transformation (log(x) ) to convert. E.g:
 
 
Sometimes we also need to design new feature variables according to the actual application to help the anomaly detection algorithm to better detect outliers. Take an example of a detection network system to see how to design more reasonable features:
 
 
From the above figure, we can see that our newly designed features   can better reflect the abnormal situation of the network.
 
Eight, multivariate Gaussian distribution
The multivariate Gaussian distribution has its advantages and limitations. Let's take a look at its advantages first. The advantage is that it can capture abnormal samples that cannot be captured by the above model. Let's take an example:
(1) Our original data is shown in the figure below (the left of the figure is two-dimensional data, and the right of the figure is to model the two features as Gaussian distributions respectively)
 
(2) If there is an abnormal point (green sample point) as shown in the left middle of the following figure, its detection in the right middle of the following figure is:
 
As can be seen from the above figure, this green anomaly sample cannot be detected in the separately modeled model. Therefore, it is necessary to build a model through a multivariate Gaussian distribution to detect.
The probability density function of the multivariate Gaussian distribution is:
 
where   is the n*n-dimensional covariance matrix, which  is  the determinant of  the matrix.
Let's look at some examples to illustrate the effect of vectors   and matrices   on the probability density function.
First look   at the impact:
Figure 1
 
Figure II
 
Figure 3
 
It can be seen from the above three figures that when   the sub-diagonals of the matrix are all 0, the size of the elements on the main diagonal controls the shape and size of the overhead view of the probability density function. As for the shape corresponding to the size of the number, everyone Observe for yourself.
Let's take a look at an example of how the main diagonal is unchanged and the sub-diagonal is changed:
 
Figure 4
 
Figure 5
 
It can be seen that the sub-diagonal controls the degree of inclination.
 
Let's take a look at   the effect on the probability density function:
 
From the above figure, it can be seen that   the position of the control graphics changes.
 
Nine, the application of multivariate Gaussian distribution in anomaly detection
After introducing the multivariate Gaussian distribution, let's introduce the anomaly detection algorithm derived from the multivariate Gaussian distribution:
 
Let's take a look at the relationship between the multivariate Gaussian distribution model and multiple univariate Gaussian models:
 
 
In fact, when the covariance matrix   of the multivariate Gaussian distribution model is a diagonal matrix, and the elements on the diagonal are the variances of the respective univariate Gaussian distribution models, the two are equivalent.
Let's summarize and compare the multivariate Gaussian distribution model and multiple univariate Gaussian distribution models:
 
 
The above are some knowledge points about anomaly detection.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324714139&siteId=291194637