Machine Learning (7) - Anomaly Detection

According to Andrew Ng notes at Stanford's "machine learning" video, does not go into details through knowledge Li Hang "statistical learning methods" to obtain the list only outline.

1 anomaly detection

Modeling the data to form probability distribution function \ (P (X) \) ; Check \ (p (x_ {test} ) \) values

eg

  • Fraud detection: the user can identify abnormal behavior
  • Industry
  • Computer monitoring data center

1.1 Gaussian / normal \ (x ~ N (\ mu , \ sigma ^ 2) \)

\ (\ MU \) : the average value, the control center position of the bell curve

\ (\ Sigma ^ 2 \) : variance, the control width of the bell curve

\ (P (x, \ it \ sigma ^ 2) = \ frac {1} {\ sqrt {2 \ pi}} exp (- \ frac {(x- \ mu) ^ 2} {2 \ sigma ^ 2 }) \)

Parameter Estimation

\(\mu=\frac{1}{m}\sum_{i=1}^mx^{(i)}\)

\(\sigma^2=\frac{1}{m}\sum_{i=1}^m(x^{(i)}-\mu)^2\)

Density Estimation

\(p(x)=p(x_1;\mu_1,\sigma_1^2)p(x_2;\mu_2,\sigma_2^2)p(x_3;\mu_3,\sigma_3^2)\cdots p(x_n;\mu_n,\sigma_n^2)=\prod_{j=1}^np(x_j;\mu_j,\sigma_j^2)\)

1.2 anomaly detection algorithm

  1. Selection feature <Section 1.4 >
  2. Parameter Estimation \ (\ mu_j, \ sigma_j ^ 2 \) or \ (\ MU, \ Sigma \) <see 1.5 >
  3. Given the new samples for density estimation, if the ratio \ (\ epsilon \) is small, abnormal

Development and evaluation

Training set: 60% no abnormal sample, the estimated mean and variance of features and build \ (p (x) \) function

Cross-validation set: 20% Sample no abnormality abnormality + 50% sample, using cross-validation set selection \ (\ \ Epsilon) , according to \ (of F_1 \) value to select

Test set: no abnormality 20% + 50% sample of abnormal samples

Metrics:

  • TP,FN,FP,TN
  • Accuracy rate / recall
  • \(F_1-score\)

1.3 Anomaly Detection vs. supervised learning

abnormal detection Supervised learning
Very small amount of positive type (abnormal data \ (. 1 Y = \) ), a large amount of negative type ( \ (Y = 0 \) ) While a large number of positive and negative class to class
Many different kinds of exceptions. According to a very small amount of forward class data to train the algorithm is very difficult. There are enough positive class instance, sufficient for training algorithm.
Abnormalities may have mastered the exception of a very different future encounter. Future Forward class instance may encounter very similar to the training set

1.4 Select feature

Wherein the non-Gaussian distribution: by a logarithmic function, square root method to modify characteristics so close to Gaussian distribution pattern

Normal and abnormal samples similar to: identify the problem, create a new feature

More than 1.5 yuan Gaussian distribution

Not on \ (p (x_i) \) modeling, instead of a one-time \ (p (x) \) modeling; parameter \ (\ MU \) is a \ (n-\) -dimensional vector, \ (\ Sigma \) a \ (n × n \) covariance matrix

\ (P (x, \ it \ sigma ^ 2) = \ frac {1} {(2 \ pi) ^ {\ frac {n} {2}} | \ Sigma | ^ {\ frac {1} {2 }}} exp (- \ frac {(x- \ mu) ^ T \ Sigma ^ {- 1} (x- \ mu)} {2}) \)

Parameter Estimation

\(\mu=\frac{1}{m}\sum_{i=1}^mx^{(i)}\)

\(\Sigma=\frac{1}{m}\sum_{i=1}^n(x^{(i)}-\mu)(x^{(i)}-\mu)^T\)

It can be found before the Gaussian distribution, which is a special case where the multivariate Gaussian distribution, i.e., \ (\ Sigma \) non-diagonal elements of the matrix are zero

Advantages: automatically captures the relationship between features → The original model will need to manually create a new feature

Disadvantages: calculated costly; \ (m> n-\) , otherwise irreversible covariance matrix ( \ (m \ GE 10N \) )

Guess you like

Origin www.cnblogs.com/angelica-duhurica/p/10962078.html