VC-dimension

VC-dimension的定义

机器学习中,我们希望引入一个量化指标,去衡量一个learner的最大表达能力,这就是VC-dimension

直观定义

不妨把多维数据想象为空间中的点,分类器learner想象为曲面。
给定空间点的个数m,甲乙两人做对抗游戏:

  • 甲:选取m个点的位置
  • 乙:从中选取0~m个数据作为一类(换句话说,给这m个数据加0/1 label)
  • 甲:给出一种参数取值,使得learner能够正确分类

乙需要尽可能地刁难甲
如果甲能做到,说明这个learner能够轻松应对m个数据的某种分布的任意label取值,VC-dimension ≥ \ge m
否则VC-dimension < < <m

逻辑定义

对于给定的m,如果
∃ x 1 , ⋯   , x m , ∀ l 1 , ⋯   , l m , ∃ θ , f ( x i ; θ ) = l i , i = 1 , ⋯   , m \exist x_1,\cdots,x_m, \forall l_1,\cdots,l_m, \exist \theta, f(x_i;\theta)=l_i , i = 1,\cdots,m x1,,xm,l1,,lm,θ,f(xi;θ)=li,i=1,,m
V C ( f ) ≥ m \mathrm{VC}(f) \ge m VC(f)m
否则 V C ( f ) < m \mathrm{VC}(f)<m VC(f)<m

更一般的数学定义
  • A set system ( X , H ) (X, \mathcal{H}) (X,H) consists of a set X X X and a class H \mathcal H H of subsets of X X X, i.e. H ⊆ P ( X ) \mathcal{H} \subseteq P(X) HP(X)
    ( X X X is a instance space, H \mathcal H H is a class of classifiers)

  • A set system ( X , H ) (X, \mathcal{H}) (X,H) shatters a set A   ⊆ X A\ \subseteq X A X iff ∀ A ′ ⊆ A , ∃ h ∈ H , A ′ = A ∩ h \forall A' \subseteq A, \exist h \in \mathcal H, A'=A\cap h AA,hH,A=Ah

  • The VC-dimension of H \mathcal H H is V C ( H ) = max ⁡ A   i s   s h a t t e r e d   b y   H ∣ A ∣ \mathrm{VC}(\mathcal H) = \max_{A\ is\ shattered\ by\ \mathcal H}|A| VC(H)=A is shattered by HmaxA

VC-dimension的应用

在这里插入图片描述
在数据量给定的情况下,随着模型参数增多,表达能力增强,Train Error不断减小,但一般误差中的VC项变大,导致模型的Test Error先减小后增大。增大对应着我们一般说的过拟合。

主要定理

Definition

For a set system ( X , H ) (X, \mathcal{H}) (X,H), the shatter function π H ( n ) \pi_{\mathcal H}(n) πH(n) is the maximum number of subsets of any set A of size n that can be expressed as A ∩ h A\cap h Ah for some h ∈ H h \in \mathcal H hH, i.e.
π H ( n ) = max ⁡ ∣ A ∣ = n ∣ { A ∩ h ∣ h ∈ H } ∣ \pi_{\mathcal H}(n) = \max_{|A|=n}|\{A\cap h| h\in \mathcal H \}| πH(n)=A=nmax{ AhhH}

Lemma (Sauer)

For a set system ( X , H ) (X, \mathcal{H}) (X,H) whose VC-dimension equals d d d,
π H ( n ) { = 2 n , n ≤ d ≤ ( n ≤ d ) , n > d \pi_{\mathcal H}(n) \begin{cases} = 2^n & ,n \le d\\ \le \dbinom{n}{\le d} & ,n > d \end{cases} πH(n)=2n(dn),nd,n>d
where
( n ≤ d ) = ( n 0 ) + ( n 1 ) + ⋯ + ( n d ) ≤ n d + 1 \dbinom{n}{\le d}=\dbinom n 0 + \dbinom n 1 + \cdots + \dbinom n d \le n^d+1 (dn)=(0n)+(1n)++(dn)nd+1

The Key Theorem

With sufficiently large n n n, n ≥ 4 ϵ n \ge \dfrac 4\epsilon nϵ4 and n ≥ 1 ϵ ( log ⁡ 2 π H ( 2 n ) + log ⁡ 2 2 δ ) n \ge \dfrac 1\epsilon \Big( \log_2\pi_{\mathcal H}(2n)+\log_2\dfrac 2 \delta \Big) nϵ1(log2πH(2n)+log2δ2)
Given training set T T T, ∣ T ∣ = n |T|=n T=n
P r o b [ ∃ h , T r u e E r r ( h ) = ϵ , T r a i n E r r ( h ) = 0 ] < δ \mathrm{Prob}\Big[\exist h, \mathrm{TrueErr}(h)=\epsilon, \mathrm{TrainErr}(h)=0\Big] < \delta Prob[h,TrueErr(h)=ϵ,TrainErr(h)=0]<δ

猜你喜欢

转载自blog.csdn.net/w112348/article/details/112590777
vc