Andrew Ng machine learning (XV) - Anomaly Detection

Motivation problem

Here Insert Picture Description
Model
case to give a given training set, how to detect a certain input x is abnormal?

First, to establish a model based on the training data set, when a given value of the data, the data is identified as abnormal, indicating that it has been identified as normal when far from the overall data center.
Examples of anomaly detection

Fraud identification is most commonly used to identify abnormal areas, a series of feature vector representation of the user i, such as logins, number of clicks a certain page, post and other times, according to model these characteristics, and then identify fraud based on a threshold value. Similarly, the product is further configured to identify abnormal detection.

Gaussian distribution

Gaussian distribution
Gaussian distribution parameter estimation

Anomaly detection algorithm based on the Gaussian distribution

Construction of the model

Suppose each of the sample data corresponding to a Gaussian distribution characteristic, the model is equal to the joint distribution of these distributions. In general, based on statistical probabilities associated multiplier hypothesis of independence, but in practice, if the sample size is large enough, independence is not so important.
Anomaly Detection

First selection feature may be required, the fitting characteristic parameters, i.e., mean and variance of the distribution obtained for each feature, can be represented by a feature vector; build a model with all the features of the joint distribution; given new sample point x, in accordance with Calcd model, to see there is no smaller than the threshold value ε.
Outliers identification Examples

There are two features on the map data, wherein each fitting parameter, p = height would be represented as a three-dimensional upper of FIG.

Development and evaluation of an anomaly detection system

Making feature choice, if you want to know whether we should add a new feature, an evaluation index value becomes very important, then performing feature selection when you can calculate the added features and without either case, when added when this feature, returns a numerical index, the algorithm can be used to determine whether the effect is improved.
Numerical evaluation of learning algorithm
Data Classification

Suppose there are 10 000 normal samples and 20 abnormal samples, evaluated according to the above manner. The set of training feature vectors calculated parameters, the model structure, the proportion of the sample have different classification methods, but do not at the same time as the test set validation set.
Algorithms Evaluation

First constructed model, the establishment of a Gaussian distribution rub book, then take the model by linking, because the sample is actually with the label, that is, with the label y, then, y for each sample in the training set feature can to help us determine the quality of the model. After the model, the algorithm evaluates centralized authentication, verifies one sample set value x is input to the model, the prediction based on the threshold value tag validation set of samples, the normal point was greater than a threshold, the threshold value is less than the outliers. And then compare the actual label samples, calculating an evaluation index such as accuracy, recall, F-score like.

For selection threshold [epsilon] of the model, can try different [epsilon], and then select the F-score corresponding to the maximum of [epsilon].

Now that we have with data labels, why not apply linear regression, logistic regression and other methods of identifying outliers it?

VS supervised learning anomaly detection

Anomaly Detection and supervised learning

Abnormality detection suitable for positive samples (y = 1) a very small number, and the negative sample (y = 0) a very large number of samples. Because this positive samples positive sample was too small to find the cause of all exceptions, if carried out supervised learning, it can not learn all the knowledge, and there may be a new strange will happen in the future, these anomalies are now unobservable to , but it can not be modeled. In contrast, anomaly detection is a large number of negative examples to model the sample so that any deviation from the model can be identified as abnormal, and what reason do Abnormal Is Mentioned before telling supervised learning when crossing example, classification of spam, it is because we have a number of spam very much, can conclude that a common feature of spam, thereby facilitating learning algorithms and modeling.

Thus, when the number of negative samples i.e. outliers very little time, the negative sample may be modeled using the data anomaly detection method, the data points deviates from the normal are considered outlier; i.e., when a negative sample outlier a very large number of times, supervised learning algorithm can learn effectively, so this time you can choose supervised learning algorithm to identify abnormal points.

Select dysfunction algorithm to use

Distributed processing features

When performing anomaly detection, we believe that the distribution of the data follow a Gaussian distribution, then the parameters estimated from the training set, the model constructed by linking then multiply, and then verify centralized authentication. But, in fact, a lot of features is not consistent with the distribution of Gaussian distribution, then we can transform be adjusted to the Gaussian distribution (in fact, not adjusted when the number of samples sufficient number of cases can, but if you make adjustments, the model results certainly better). There are many ways to adjust, the parameter values ​​may be shown above logarithmic, square root, etc., by adjusting the exponent parameters, data distribution tends to be Gaussian distribution.
 Anomaly detection error analysis

We want to get the model of greater value in the positive sample, the smaller negative value in the sample. We can take this method, first conducted to establish the initial model, and in the final analysis model, when the poor performance of the model when analyzing what possible reason is that, based on these reasons go to select the appropriate feature. A common problem is when a single feature, the amount of normal and abnormal points points are great, this time, you can add new features to an abnormality detection.
Feature selection Example
We can determine the problem, construct their own characteristics.

Multivariate Gaussian distribution

An abnormality detection extending
Anomaly detection unrecognized abnormalities
upper left corner of FIG green dot abnormal data, typically in a lower CPU load time, memory usage should be low, but the different points. When considered separately CPU load and memory usage when two features, two coordinates shown on the right, the abnormal point exception did not show it, the CPU load point of view, value is less than this point, there are many; the terms of memory usage greater than this point, there are many. Thus using the abnormality detection algorithm can not identify the outliers This is because the Gaussian abnormality detection time, in accordance with the magenta line to divide the left, closer to the point inside the circle more normal, internal principle circle the point is not normal. This ignores the relationship between different features.

In order to improve this anomaly recognition algorithm is insufficient, there is an improved anomaly detection algorithm, multivariate Gaussian distribution.
Here Insert Picture Description
A multivariate Gaussian distribution
multivariate Gaussian model is not the time distribution respectively of each feature considered as a Gaussian distribution, but is integrated into a distribution, the distribution parameter indicates the covariance matrix of the sample. As the parameter changes, distribution changes as shown in the sample:
Here Insert Picture Description
Here Insert Picture Description
Here Insert Picture Description
when the change characteristic variance while
when only a change in the variance of the feature vector
when the two feature vectors highly correlated
size in the sub-diagonal elements of the covariance matrix represent two the correlation coefficient of the features, and therefore, the larger the value, the greater the correlation of two characteristics, the sample distribution as shown in FIG. Similarly, when the correlation coefficient is negative, a negative correlation indicates two features, the sample distribution is as follows:
Here Insert Picture Description
wherein a negative correlation
when changing the mean time, the peak of the distribution will change, i.e., the mean change is to move the entire distribution center:
Here Insert Picture Description
mean change multivariate Gaussian distribution

Multivariate Gaussian distribution anomaly detection

Here Insert Picture Description
Multivariate Gaussian distribution parameter estimation
in multivariate Gaussian distribution, the parameters to be estimated is the mean vector and the sigmoid function.
Here Insert Picture Description
Multivariate Gaussian distribution model
after the parameters determined, the model may be established according to the above formula, given a new sample x, when it is smaller than the threshold value ε will be identified when the abnormality.
Here Insert Picture Description
Univariate comparison with the Gaussian model of
univariate Gaussian distribution is actually a feature of the sample independently of each other when special circumstances multivariate Gaussian distribution
Here Insert Picture Description

Compared with traditional multivariate Gaussian distribution with a Gaussian distribution
in a conventional Gaussian distribution, if hand-related features establish a relationship between capture abnormal relationship, it is possible using conventional Gaussian abnormality detection, if this is not established their own identification relationship, then it suitable for use multivariate Gaussian distribution, it will automatically capture the relationship between features; the use of traditional training set smaller when Gaussian distribution is possible, to use multivariate Gaussian distribution, then it requires training data to a large amount, the amount of training data sets wherein m is much greater than the number n, generally m> 10n, better, otherwise there will be singular. Further advantage, the traditional simple Gaussian distribution may be calculated, and the calculated amount of the multivariate Gaussian distribution increases with the number of the characteristics.

If you had a singular matrix in the use of multivariate Gaussian distribution may be a problem with the following two aspects: one is the amount of data is too small, it does not reach far exceeds the number of feature requirements; on the other hand is characterized by the presence of redundancy, that is characteristic of there is a linear relationship between.
References Andrew Ng machine learning - Anomaly Detection
Anomaly Detection Andrew Ng machine learning notes of
Andrew Ng machine learning Chinese version notes: Anomaly Detection (Anomaly Detection)

Published 80 original articles · won praise 140 · views 640 000 +

Guess you like

Origin blog.csdn.net/linjpg/article/details/104331948