机器学习基石笔记(六):泛化理论



Lecture 6: Theory of Generalization


Restriction of Break Point


The Four Break Points


N=3, K=2 Break Point


m H ( N )  maximum possible  m H ( N )  given  k p o l y ( N ) \begin{aligned} & m_{\mathcal{H}}(N) \\ \leq & \text { maximum possible } m_{\mathcal{H}}(N) \text { given } k \\ \leq & p o l y(N) \end{aligned}

Fun Time

When minimum break point k = 1, what is the maximum possible m H ( N ) m_{\mathcal{H}}(N) when N = 3 N = 3
1.  1  \checkmark              2. 2          3. 3           4. 4


Explanation
因为 k = 1 k=1 ,所以没有任何一个点可以和它共存,所以 m H ( N ) = 1 m_H (N) = 1

Bounding Function: Basic Cases


Bounding Function

bounding function B ( N , k ) B(N,k) :
  maximum possible m H ( N ) m_H (N) when break point = k
B ( N , k ) p o l y ( N ) B(N, k) \leq p o l y(N)

换言之, B ( N , k ) B(N, k) m H ( N ) m_H (N) 上界


Table of Bounding Function

Table of Bounding Function

Fun Time

For the 2D perceptrons, which of the following claim is true?
1 minimum break point k = 2
2 m H ( 4 ) m_{\mathcal{H}}(4) = 15
3 m H ( N ) < B ( N , k ) m_{\mathcal{H}}(N)<B(N, k) when $N = k = $ minimum break point   \checkmark
4 m H ( N ) > B ( N , k ) m_{\mathcal{H}}(N)>B(N, k) when $N = k = $ minimum break point


Explanation
minimum break point k = 3
m H ( 4 ) m_{\mathcal{H}}(4) = 14
B ( N , k ) B(N, k) m H ( N ) m_H (N) 上界
不记得2D感知器的同学,可以回顾Lecture 5: Training versus Testing中的Effective Number of Hypotheses ?

Bounding Function: Inductive Cases


B ( 4 , 3 ) = 11 = 2 α + β B(4,3)=11=2 \alpha+\beta
Instance Estimating Part


B ( N , k ) = 2 α + β α + β B ( N 1 , k ) α B ( N 1 , k 1 ) B ( N , k ) B ( N 1 , k ) + B ( N 1 , k 1 ) B ( N , k ) i = 0 k 1 ( N i ) \begin{aligned} B(N, k) &=2 \alpha+\beta \\ \alpha+\beta & \leq B(N-1, k) \\ \alpha & \leq B(N-1, k-1) \\ \Rightarrow B(N, k) & \leq B(N-1, k)+B(N-1, k-1) \end{aligned} \\ B(N, k) \leq \sum_{i=0}^{k-1} \left( \begin{array}{c}{N} \\ {i}\end{array}\right)
The Upper Bound of Bounding Function

\le 实际上是 = =

B ( N , k ) = B ( N 1 , k ) + B ( N 1 , k 1 ) B ( N , k ) = i = 0 k 1 ( N i ) = C N 0 + C N 1 + . . . + C N k 1 B(N, k) = B(N-1, k)+B(N-1, k-1) \\ B(N, k) = \sum_{i=0}^{k-1} \left( \begin{array}{c}{N} \\ {i}\end{array}\right) = C_N^0+C_N^1 +...+C_N^{k-1}

The Three Break Points

2D perceptrons break point at 4, m H ( N ) B ( N , 4 ) = 1 6 N 3 + 5 6 N + 1 = O ( N 3 ) m_{\mathcal{H}}(N) \leq B(N, 4) = \frac{1}{6} N^{3}+\frac{5}{6} N+1 = O(N^3)

扫描二维码关注公众号,回复: 9010346 查看本文章

Fun Time

For 1D perceptrons (positive and negative rays), we know that m H ( N ) m_H (N) = 2N. Let k be the minimum break point. Which of the following is not true?
1 k = 3
2 for some integers N > 0 ,   m H ( N ) = i = 0 k 1 ( N i ) N>0 ,\ m_{\mathcal{H}}(N)=\sum_{i=0}^{k-1} \left( \begin{array}{c}{N} \\ {i}\end{array}\right)
3 for all integers N > 0 ,   m H ( N ) = i = 0 k 1 ( N i ) N>0 ,\ m_{\mathcal{H}}(N)=\sum_{i=0}^{k-1} \left( \begin{array}{c}{N} \\ {i}\end{array}\right)   \checkmark
4 for all integers N > 2 ,   m H ( N ) < i = 0 k 1 ( N i ) N>2 ,\ m_{\mathcal{H}}(N)<\sum_{i=0}^{k-1} \left( \begin{array}{c}{N} \\ {i}\end{array}\right)


Explanation
minimum break point k = 3
B ( N , k ) = i = 0 k 1 ( N i ) B(N, k) = \sum_{i=0}^{k-1} \left( \begin{array}{c}{N} \\ {i}\end{array}\right)
B ( N , k ) B(N, k) m H ( N ) m_H (N) 上界,当N \ge k时, m H ( N ) < B ( N , k ) m_H (N)<B(N, k) ; 当N < < k时, m H ( N ) = B ( N , k ) m_H (N)=B(N, k) .


拓展:回顾下Lecture 5: Training versus Testing中的Effective Number of Hypotheses Funtime
求2维感知器中5个点的有效分类数(k=3,N=5 m H ( N ) = ? 1 6 N 3 + 5 6 N + 1 m_{\mathcal{H}}(N)=? \leq \frac{1}{6} N^{3}+\frac{5}{6} N+1 ),N>k,=取不到。
正确答案22<( 125 6 + 25 6 + 1 = 25 \frac{125}{6}+\frac{25}{6}+1=25 ),验证成功,回顾题目也挺有趣味的。?


A Pictorial Proof

Step 1: Replace E_out by E_in'

E i n E_{in}&#x27; (有限)替换 E o u t E_{out} (无限),但是这个不等式及 1 2 \frac{1}{2} 的系数的出处,我没想明白。

Step 2: Decompose H by Kind

将上界定义为以 m H ( 2 N ) m_{H}(2N) 为基准的。

Step 3: Use Hoeffding without Replacement

使用无放回的霍夫丁不等式,结果类似,只是 ν = E  in  , μ = E  in  + E  in  2 \nu=E_{\text { in }},\mu=\frac{E_{\text { in }}+E_{\text { in }}^{\prime}}{2}

Vapnik-Chervonenkis (VC) bound

P [ h H  s.t.  E  in  ( h ) E  out  ( h ) &gt; ϵ ] 4 m H ( 2 N ) exp ( 1 8 ϵ 2 N ) \begin{aligned} &amp; \mathbb{P}\left[\exists h \in \mathcal{H} \text { s.t. } | E_{\text { in }}(h)-E_{\text { out }}(h) |&gt;\epsilon\right] \\ &amp; \leq 4 m_{\mathcal{H}}(2 N) \exp \left(-\frac{1}{8} \epsilon^{2} N\right) \end{aligned}
   m H ( N ) m_H (N) can replace M with a few changes

Fun Time

For positive rays, m H ( N ) = N + 1 m_H (N) = N + 1 . Plug it into the VC bound for ? = 0.1 and N = 10000. What is VC bound of BAD events?
P [ h H  s.t.  E  in  ( h ) E  out  ( h ) &gt; ϵ ] 4 m H ( 2 N ) exp ( 1 8 ϵ 2 N ) \mathbb{P}\left[\exists h \in \mathcal{H} \text { s.t. } | E_{\text { in }}(h)-E_{\text { out }}(h) |&gt;\epsilon\right] \leq 4 m_{\mathcal{H}}(2 N) \exp \left(-\frac{1}{8} \epsilon^{2} N\right)
1 2.77 × 1 0 87 2.77 × 10^{−87}
2 5.54 × 1 0 83 5.54 × 10^{−83}
3 2.98 × 1 0 1 2.98 × 10^{−1}   \checkmark
4 2.29 × 1 0 2 2.29 × 10^{−2}


Explanation
代入公式计算即可。
0.2981471603789822

Summary

本篇讲义主要讲了Bound Function B ( N , k ) B(N,k) 以及VC Bound的含义及推导。


讲义总结


m H ( N ) m_{\mathcal{H}}(N) 有break point,且 N N 足够大,那么 E o u t E i n E_{\mathrm{out}} \approx E_{\mathrm{in}} .


Restriction of Break Point
  break point ‘breaks’ consequent points

Bounding Function: Basic Cases
   B ( N , k ) B(N,k) bounds m H ( N ) m_H (N) with break point k

Bounding Function: Inductive Cases
   B ( N , k ) B(N,k) is poly(N)

A Pictorial Proof
   m H ( N ) m_H (N) can replace M with a few changes

参考文献

《Machine Learning Foundations》(机器学习基石)—— Hsuan-Tien Lin (林轩田)

发布了57 篇原创文章 · 获赞 44 · 访问量 2万+

猜你喜欢

转载自blog.csdn.net/the_harder_to_love/article/details/89446631
今日推荐