mH(N) breaks at k(good H) and N large enough(good D) => Eout≈Ein
A picks a g with small Ein (good A) => Ein≈0
最后学到了东西,到达了要求 => Eout≈0
VC Dimension
VC Dimension is the maximum of non-break point. largest N for which mH(N) = 2N dvc = ‘minimum k’ - 1 the most inputs H that can shatter VC Dimension反映了假设集在数据集上的性质:数据集的最大分类情况。 dvc和minimum break point都是"一线曙光"的分界点。 ifN≥2,dvc≥2,mH(N)≤Ndvc
和之前一样(minimum break point finite),我们希望dvc finite.
Fun Time
If there is a set of N inputs that cannot be shattered by H. Based only on this information, what can we conclude about dVC(H)? 1 dvc(H)>N 2 dvc(H)=N 3 dvc(H)<N 4 no conclusion can be made✓
Explanation 这道题目当初也是百思不得其解,直到我又学习了一遍。 VC Dimension:the most inputs H that can shatter 有N个数据no shatter不能代表其他的N个数据no shatter。 举例说明: 1维感知器 break point为3, 2维感知器 break point为4。 在2维空间中,连成一条直线的三个点(2维退化到1维),分类数只有6(<23)种 。而2维感知器的break point却为4,这就是因为不在同一条直线上的三个点分类数可以为8。所以导致2维感知器的最小break point增大了。 这里面也蕴含着提升维度,增大了自由度。?
What statement below shows that dvc≥d+1 1 There are some d + 1 inputs we can shatter.✓ 2 We can shatter any set of d + 1 inputs. 3 There are some d + 2 inputs we cannot shatter. 4 We cannot shatter any set of d + 2 inputs.
Explanation 存在(d+1)个点 shatter => dvc≥d+1
Special X(d+1个点) 对于任意的y,都有对应的hypothesis(w) .
Extra Fun Time
What statement below shows that dvc≤d+1 1 There are some d + 1 inputs we can shatter. 2 We can shatter any set of d + 1 inputs. 3 There are some d + 2 inputs we cannot shatter. 4 We cannot shatter any set of d + 2 inputs.✓
Based on the proof above, what is d VC of 1126-D perceptrons? 1 1024 2 1126 3 1127✓ 4 6211
Explanation d-D perceptrons:dvc=d+1
Physical Intuition of VC Dimension
dvc(H): powerfulness of H dvc ≈ #free parameters dvc代表了自由度(与自由变量的个数相关),我在学统计学时接触过自由度这个概念,若μ(均值)确定,数据大小为N的自由度就为N-1,因为μ确定,知道N-1个点,就能求出第N个点。 dVC(H)就代表了H关于数据的自由度。
M and dVC
so choose the righthypothesisset(M or dvc).
Fun Time
Origin-crossing Hyperplanes are essentially perceptrons with w0 fixed at 0. Make a guess about thedvc of origin-crossing hyperplanes in Rd. 1 1 2 d✓ 3 d+1 4 ∞
PD⎣⎡ BAD [E in (g)−E out (g)∣>ϵ]⎦⎤≤δ4(2N)dvcexp(−81ϵ2N)probability≥1−δ,GOOD:∣Ein(g)−Eout(g)∣≤ϵδ=4(2N)dvcexp(−81ϵ2N)ϵ=N8ln(δ4(2N)dvc)Ein(g)−N8ln(δ4(2N)dvc)≤Eout(g)≤Ein(g)+N8ln(δ4(2N)dvc)Ω(N,H,δ)N8ln(δ4(2N)dvc) penalty for model complexity
Sample Complexity
PD⎣⎡ BAD ∣E in (g)−E out (g)∣>ϵ⎦⎤≤δ4(2N)d0exp(−81ϵ2N)givenϵ=0.1,δ=0.1,dvc=3N1001,00010,000100,00029,300 bound 2.82×1079.17×1091.19×1081.65×10−389.99×10−2sample complexity:N≈10,000dvcin theory
import math
# 计算VC Bounddefcount_vc_bound(N, epsilon=0.1, delta=0.1, d_vc=3):
vc_bound =4*(2*N)**d_vc*math.exp(-0.125*epsilon**2*N)return vc_bound
if __name__ =='__main__':print('%e'% count_vc_bound(100))print('%e'% count_vc_bound(1000))print('%e'% count_vc_bound(10000))print('%e'% count_vc_bound(100000))print('%e'% count_vc_bound(29300))print(count_vc_bound(0))print(count_vc_bound(1))
delta =0.1
n =1while count_vc_bound(n)> delta:
n = n +1print(n)print('%e'% count_vc_bound(29299))print('%e'% count_vc_bound(29300))print('%e'% count_vc_bound(29301))
Looseness of VC Bound
N≈10,000dvcin theory,N≈10dvcin practice
我们能基于VC Bound原理提升整个机器学习流程。
Fun Time
Consider the VC Bound below. How can we decrease the probability of getting BAD data? 1 decrease model complexity dvc 2 increase data size N a lot 3 increase generalization error tolerance ϵ 4 all of the above✓