【机器学习】10 异常检测

1 Problem Motivation

  • 识别欺骗
  • 密度估计: if   p ( x ) = { < ε , anomally ≥ ε , normal \text{if}\ \ p(x)=\begin{cases} <\varepsilon,&\text{anomally}\\ ≥\varepsilon,&\text{normal} \end{cases} if  p(x)={ <ε,ε,anomallynormal
  • Anomaly detection example
    (1) Fraud Detection 反欺诈:
       x ( i ) x^{(i)} x(i) = features of user i i i's activities
      Model p ( x ) p(x) p(x) from data
      Identify unusual users by checking which have p ( x ) < ε p(x)<\varepsilon p(x)<ε
    (2) Manufacturing
    (3) Monitoring computers in a data center
       x ( i ) x^{(i)} x(i) = features of machine i i i

2 Gaussian ( Normal ) Distribution 高斯(正态)分布

  • x ∼ N ( μ , σ 2 ) x\sim N(\mu,\sigma^2) xN(μ,σ2)
  • 概率密度函数: p ( x , μ , σ 2 ) = 1 2 π σ exp ( − ( x − μ ) 2 2 σ 2 ) p(x,\mu,\sigma^2)=\frac{1}{\sqrt{2\pi\sigma}}\text{exp}\left(-\frac{ {(x-\mu)}^2}{2\sigma^2}\right) p(x,μ,σ2)=2πσ 1exp(2σ2(xμ)2)
  • 均值:【决定中心】 μ = 1 m ∑ i = 1 m x ( i ) \mu=\frac{1}{m}\sum_{i=1}^mx^{(i)} μ=m1i=1mx(i)
  • 方差:【决定宽度】 σ 2 = 1 m ∑ i = 1 m ( x ( i ) − μ ) 2 \sigma^2=\frac{1}{m}\sum_{i=1}^m{\left(x^{(i)}-\mu\right)}^2 σ2=m1i=1m(x(i)μ)2
  • 示例高斯分布

3 Algorithm

  1. Choose features x i x_i xi that you think might be indicative of anomalous examples
  2. Fit parameters μ 1 , ⋅ ⋅ ⋅ , μ n , σ 1 2 , ⋅ ⋅ ⋅ , σ n 2 \mu_1,···,\mu^n,\sigma_1^2,···,\sigma_n^2 μ1,,μn,σ12,,σn2
    μ j = 1 m ∑ i = 1 m x j ( i ) σ j 2 = 1 m ∑ i = 1 m ( x j ( i ) − μ j ) 2 \begin{aligned} \mu_j&=\frac{1}{m}\sum_{i=1}^mx^{(i)}_j\\ \sigma^2_j&=\frac{1}{m}\sum_{i=1}^m{\left(x^{(i)}_j-\mu_j\right)}^2 \end{aligned} μjσj2=m1i=1mxj(i)=m1i=1m(xj(i)μj)2
  3. Given new example x x x, compute p ( x ) p(x) p(x)
    p ( x ) = ∏ j = 1 n p ( x j ; μ j , σ 2 ) = = ∏ j = 1 n 1 2 π σ j exp ( − ( x j − μ j ) 2 2 σ j 2 ) p(x)=\prod_{j=1}^n p(x_j;\mu_j,\sigma^2)==\prod_{j=1}^n \frac{1}{\sqrt{2\pi\sigma_j}}\text{exp}\left(-\frac{ {(x_j-\mu_j)}^2}{2\sigma^2_j}\right) p(x)=j=1np(xj;μj,σ2)==j=1n2πσj 1exp(2σj2(xjμj)2)
  4. Anomaly if p ( x ) < ε p(x)<\varepsilon p(x)<ε

4 Developing and Evaluating an Anomaly Detection System 开发和评价一个异常检测系统

  1. 根据测试集数据,我们估计特征的平均值和方差并构建 p ( x ) p(x) p(x)函数
  2. 对交叉验证集,我们尝试使用不同的 ε \varepsilon ε值作为阈值,并预测数据是否异常,根据 F1 值或者查准率与查全率的比例来选择 ε \varepsilon ε
  3. 选出 ε \varepsilon ε后,针对测试集进行预测,计算异常检验系统的 F1 值,或者查准率与查全率之比

5 异常检测与监督学习对比

异常检测 监督学习
Very small number of positive examples ( y = 1 y=1 y=1) Larger number of positive and negative examples
Many different “types” of anomalies. Hard for any algorithm to learn from positive examples when the anomalies look like Enough positive examples for algorithm to get a sense of what positive examples are like, future positive examples likely to be similar to ones in training set
Future anomalies may look nothing like any of the anomalous examples we’ve seen so far
Fraud detection; Manufacturing; Monitoring machines in a data center Email spam classification; Weather prediction; Cancer classification

6 Choosing What Features to Use 选择特征

6.1 Non-gaussian Features

  • 将数据转换成高斯分布:
    (1) 使用对数函数: x = log ( x + c ) x=\text{log}(x+c) x=log(x+c) c c c为非负常数
    (2) x = x c x=x^c x=xc,, c c c 0 ∼ 1 0\sim1 01之间的一个分数

6.2 Error Analysis for Anomaly Detection

  • Want p ( x ) p(x) p(x) large for normal examples x x x
        p ( x ) p(x) p(x) small for anomalous examples x x x
  • Most common problem:
        p ( x ) p(x) p(x) is comparable (say, both large) for normal and anomalous examples

6.3 Monitoring computers in a data center

  • Choose features that might take on unusually large or small values in the event of an anomaly
  • 可通过将一些相关的特征进行组合,来获取一些新的更好的特征

7 Multivariate Gaussian Distribution 多元高斯分布

  • it allows you to capture when you’d expect two different features to be positively correlated or may be negative correlated

7.1 Alogorithm

  • μ = 1 m ∑ i = 1 m x ( i ) Σ = 1 m ∑ i = 1 m ( x ( i ) − μ ) ( x ( i ) − μ ) T p ( x ; μ , Σ ) = 1 ( 2 π ) n 2 ∣ Σ ∣ 1 2 exp ( − 1 2 ( x − μ ) T Σ − 1 ( x − μ ) ) \begin{aligned} \mu&=\frac{1}{m}\sum_{i=1}^mx^{(i)}\\ \Sigma&=\frac{1}{m}\sum_{i=1}^m\left(x^{(i)}-\mu\right){\left(x^{(i)}-\mu\right)}^T\\ p(x;\mu,\Sigma)&=\frac{1}{ {(2\pi)}^{\frac{n}{2}}{|\Sigma|}^{\frac{1}{2}}}\text{exp}\left(-\frac{1}{2}(x-\mu)^T\Sigma^{-1}(x-\mu)\right) \end{aligned} μΣp(x;μ,Σ)=m1i=1mx(i)=m1i=1m(x(i)μ)(x(i)μ)T=(2π)2nΣ211exp(21(xμ)TΣ1(xμ))

7.2 原高斯分布模型和多元高斯分布模型的比较

原高斯分布模型 多元高斯分布模型
Manually create features to capture anomalies where x 1 , x 2 x_1,x_2 x1,x2 take unusual combinations of values Automatically captures correlations between features
Computationally cheaper (alternatively, scales better to large) Computationally more expensive
OK even if m m m is small Must have m > n m>n m>n or else Σ \Sigma Σ is non-inverible

8 Reference

吴恩达 机器学习 coursera machine learning
黄海广 机器学习笔记

猜你喜欢

转载自blog.csdn.net/qq_44714521/article/details/108564140