第七周学习笔记

第七周学习笔记

本周的主要学习工作

1.CS229

[课程地址]

第十九讲,微分动态规划

主要内容
  • Debug强化学习算法
    假设我们要建立一个直升机的强化学习算法

  • 建立一个直升机的模拟器

  • 选择回报函数

  • 使用强化学习算法,得到策略
    假如表现很差,怎么办?

  • 诊断

    • 如果学习到的策略在模拟器中效果不错,但在现实中表现不佳,则是模拟器的问题
    • 如果人类策略的值函数大于学习结果,则是强化学习方法的不佳
    • 如果学习结果的值函数大于人类结果,但现实中表现不佳,则是回报函数的问题
      1.升级模拟器
      2.改进R
      3.改进RL算法
  • 线性二次调节控制(Linear-Quadratic Regulation Control)

  • 微分动态规划(differential dynamic programming)

  • Kalman滤波 线性二次高斯控制(Linear-Quadratic Gaussian Control)

第二十讲,策略搜索

主要内容
  • POMDP
  • 策略搜索(Policy search)
  • Pegasus,固定随机模拟器的伪随机序列

2.Problem Set #1

(a) J ( θ ) = 1 2 ( X θ y ) T ( X θ y ) J(\theta)=\dfrac{1}{2}(X\theta-y)^T(X\theta-y) = J θ = X T ( X θ y ) \nabla=\dfrac{\partial{J}}{\partial{\theta}}=X^T(X\theta-y) H = 2 J θ 2 = X T X H=\dfrac{\partial{^2J}}{\partial{\theta}^2}=X^TX
(b)对于任意的初始值 θ 0 \theta_0
θ 1 = θ 0 H 1 = ( X T X ) 1 X T y \theta_1=\theta_0-H^{-1}\nabla=(X^TX)^{-1}X^Ty
一次迭代即可收敛
2.已在之前的实验中完成
代码在这
3.
( a ) J ( Θ ) = 1 2 Θ T X y 2 J(\Theta)=\dfrac{1}{2}\left\| \Theta^TX-y\right\|^2
( b ) Θ = ( X T X ) 1 X T Y \Theta=(X^TX)^{-1}X^TY
( c )
对于任意j
θ j = ( X T X ) 1 X T y j \theta_j=(X^TX)^{-1}X^Ty_j
写成矩阵形式为
Θ = ( X T X ) 1 X T Y \Theta=(X^TX)^{-1}X^TY
即分别线性回归和一起线性回归的结果相同
4.
( a )
l ( φ ) = l o g i = 1 m [ ( 1 ϕ y ) p ( x ( i ) y ( i ) = 0 ) ] ( 1 y ( i ) ) [ ϕ y p ( x ( i ) y ( i ) = 1 ) ] y ( i ) = i = 1 m ( 1 y ( i ) ) ( l o g ( 1 ϕ y ) + j = 1 n ( x j l o g ϕ j y = 0 + ( 1 x j ) l o g ( 1 ϕ j y = 0 ) ) ) + y ( i ) ( l o g ϕ y + j = 1 n x j l o g ϕ j y = 1 + ( 1 x j ) l o g ( 1 ϕ j y = 1 ) ) \begin{aligned} l(\varphi)&=log\prod_{i=1}^{m}[(1-\phi_y)p(x^{(i)}|y^{(i)}=0)]^{(1-y^{(i)})}[\phi_{y}p(x^{(i)}|y^{(i)}=1)]^{y^{(i)}} \\ &=\sum_{i=1}^{m}(1-y^{(i)})\Bigl( log(1-\phi_y)+\sum_{j=1}^{n}\bigl( x_jlog\phi_{j|y=0}+(1-x_j)log(1-\phi_{j|y=0}) \bigr) \Bigr)\\&+y^{(i)}\Bigl({log\phi_y}+\sum_{j=1}^{n}x_jlog\phi_{j|y=1}+(1-x_j)log(1-\phi_{j|y=1})\Bigr) \end{aligned}
( b )
l ( ϕ ) ϕ j y = 0 = i = 1 m ( 1 y ( i ) ) x j ( i ) ϕ j y = 0 ( 1 y ( i ) ) ( 1 x j ( i ) ) 1 ϕ j y = 0 \dfrac{\partial{l(\phi)}}{\partial{\phi_{j|y=0}}}=\sum_{i=1}^{m}\dfrac{(1-y^{(i)})x_j^{(i)}}{\phi_{j|y=0}}-\dfrac{(1-y^{(i)})(1-x_j^{(i)})}{1-\phi_{j|y=0}} l ( ϕ ) ϕ j y = 1 = i = 1 m y ( i ) x j ( i ) ϕ j y = 1 y ( i ) ( 1 x j ) 1 ϕ j y = 1 \dfrac{\partial{l(\phi)}}{\partial{\phi_{j|y=1}}}=\sum_{i=1}^{m}\dfrac{y^{(i)}x_j^{(i)}}{\phi_{j|y=1}}-\dfrac{y^{(i)}(1-x_j)}{1-\phi_{j|y=1}} l ( ϕ ) ϕ y = i = 1 m 1 y ( i ) 1 ϕ y + y ( i ) ϕ y \dfrac{\partial{l(\phi)}}{\partial{\phi_y}}=\sum_{i=1}^{m}-\dfrac{1-y^{(i)}}{1-\phi_y}+\dfrac{y^{(i)}}{\phi_y}
令以上偏导数都等于0,得
ϕ j y = 0 = i = 1 m 1 { y ( i ) = 0 } x j ( i ) i = 1 m 1 { y ( i ) = 0 } \phi_{j|y=0}=\dfrac{\sum_{i=1}^{m}1\{ y^{(i)}=0\}x^{(i)}_j}{\sum_{i=1}^{m}1\{ y^{(i)}=0\}} ϕ j y = 1 = i = 1 m 1 { y ( i ) = 1 } x j ( i ) i = 1 m 1 { y ( i ) = 1 } \phi_{j|y=1}=\dfrac{\sum_{i=1}^{m}1\{ y^{(i)}=1\}x^{(i)}_j}{\sum_{i=1}^{m}1\{ y^{(i)}=1\}} ϕ y = i = 1 m y ( i ) m \phi_y=\dfrac{\sum_{i=1}^{m}y^{(i)}}{m}
( c )
p ( y = 1 x ) p ( y = 0 x ) p ( x y = 1 ) p ( y ) p ( x y = 0 ) p ( y ) p ( x y = 1 ) ϕ y p ( x y = 0 ) ( 1 ϕ y ) j = 1 n p ( x j y = 1 ) ϕ y j = 1 n p ( x j y = 0 ) ( 1 ϕ y ) j = 1 n ϕ j y = 1 x j ( 1 ϕ j y = 1 ) 1 x j ϕ y j = 1 n ϕ j y = 0 x j ( 1 ϕ j y = 0 ) 1 x j ( 1 ϕ y ) j = 1 n x j l o g ϕ j y = 1 + ( 1 x j ) l o g ( 1 ϕ j y = 1 ) + l o g ϕ y 1 ϕ y j = 1 n x j l o g ϕ j y = 0 x j + ( 1 x j ) ( 1 ϕ j y = 0 ) 1 x j j = 1 n x j l o g ϕ j y = 1 ϕ j y = 0 + ( 1 x j ) l o g 1 ϕ j y = 1 1 ϕ j y = 0 + l o g ϕ y 1 ϕ y 0 j = 1 n x j l o g ϕ j y = 1 ( 1 ϕ j y = 0 ) ϕ j y = 0 ( 1 ϕ j y = 1 ) + j = 1 n l o g 1 ϕ j y = 1 1 ϕ j y = 0 + l o g ϕ y 1 ϕ y 0 \begin{aligned} p(y=1|x)\geq p(y=0|x)& \Longleftrightarrow p(x|y=1)p(y)\geq p(x|y=0)p(y)\\ &\Longleftrightarrow p(x|y=1)\phi_y\geq p(x|y=0)(1-\phi_y)\\ &\Longleftrightarrow \prod_{j=1}^{n}p(x_j|y=1)\phi_y\geq \prod_{j=1}^{n}p(x_j|y=0)(1-\phi_y)\\ &\Longleftrightarrow \prod_{j=1}^{n}\phi_{j|y=1}^{x_j}(1-\phi_{j|y=1})^{1-x_j}\phi_y\geq\prod_{j=1}^{n}\phi_{j|y=0}^{x_j}(1-\phi_{j|y=0})^{1-x_j}(1-\phi_y)\\ &\Longleftrightarrow \sum_{j=1}^{n}x_jlog\phi_{j|y=1}+(1-x_j)log(1-\phi_{j|y=1})+log\dfrac{\phi_y}{1-\phi_y}\geq\sum_{j=1}^{n}x_jlog\phi_{j|y=0}^{x_j}+(1-x_j)(1-\phi_{j|y=0})^{1-x_j}\\ &\Longleftrightarrow \sum_{j=1}^{n}x_jlog\dfrac{\phi_{j|y=1}}{\phi_{j|y=0}}+(1-x_j)log\dfrac{1-\phi_{j|y=1}}{1-\phi_{j|y=0}}+log\dfrac{\phi_y}{1-\phi_y}\geq 0\\ & \Longleftrightarrow \sum_{j=1}^{n}x_jlog\dfrac{\phi_{j|y=1}(1-\phi_{j|y=0})}{\phi_{j|y=0}(1-\phi_{j|y=1})}+\sum_{j=1}^{n}log\dfrac{1-\phi_{j|y=1}}{1-\phi_{j|y=0}}+log\dfrac{\phi_y}{1-\phi_y}\geq 0 \end{aligned}
所以最终得到的是一个线性分类器
5.
( a )
p ( y ; ϕ ) = ( 1 ϕ ) y 1 ϕ = e x p { l o g ( 1 ϕ ) y l o g ϕ 1 ϕ } p(y;\phi)=(1-\phi)^{y-1}\phi=exp\{log(1-\phi)y-log\dfrac{\phi}{1-\phi}\}
所以
b ( y ) = 1 b(y)=1 η = l o g ( 1 ϕ ) \eta=log(1-\phi) T ( y ) = y T(y)=y a ( η ) = η + l o g ( 1 e η ) a(\eta)=\eta+log(1-e^{\eta})
( b ) g ( η ) = 1 1 e η g(\eta)=\dfrac{1}{1-e^{\eta}} h θ ( x ) = 1 1 e θ T x h_{\theta}(x)=\dfrac{1}{1-e^{\theta^Tx}}
( c )
y = 1 1 e θ T x y=\dfrac{1}{1-e^{\theta^{T}x}} l ( θ ) = e θ T x ( y 1 ) ( 1 e θ T x ) l(\theta)=e^{\theta^Tx(y-1)}(1-e^{\theta^Tx}) l o g l ( θ ) = θ T x ( y 1 ) + l o g ( 1 e θ T x ) logl(\theta)=\theta^Tx(y-1)+log(1-e^{\theta^Tx} ) l o g l ( θ ) θ = ( y 1 1 e θ T x ) x \dfrac{\partial{logl(\theta)}}{\partial{\theta}}=(y-\dfrac{1}{1-e^{\theta^Tx}})x
权值更新公式为
θ k + 1 = θ k α J θ \theta_{k+1}=\theta_{k}-\alpha\dfrac{\partial{J}}{\partial{\theta}}

Problem Set #2

( a )
J ( θ ) = 1 2 ( X θ y ) T ( X θ y ) + λ 2 θ T θ J(\theta)=\dfrac{1}{2}(X\theta-y)^T(X\theta-y)+\dfrac{\lambda}{2}\theta^T\theta J θ = X T ( X θ y ) + λ θ \dfrac{\partial{J}}{\partial{\theta}}=X^T(X\theta-y)+\lambda\theta
J J 是凸函数,令偏导数为零 θ = ( X T X + λ I ) 1 X T y \theta=(X^TX+\lambda I)^{-1}X^Ty
( b ) θ T x = y T X ( λ I + X T X ) 1 x = y T ( λ I + X X T ) 1 X x \theta^Tx=y^TX(\lambda I+X^TX)^{-1}x=y^T(\lambda I+XX^T)^{-1}Xx
ϕ ( x ( i ) ) \phi(x^{(i)}) 代替X中的每一列,则无需计算出每个 ϕ ( x ( i ) ) \phi(x^{(i)}) ,只要计算出 ϕ ( x ( i ) ) \phi(x^{(i)}) 之间的内积即可。
此外
( λ I + B A ) 1 B = 1 λ ( B A λ + B A B A λ 2 + ) B = B λ ( A B λ + A B A B λ 2 + ) = B ( λ I + A B ) 1 (\lambda I + BA)^{-1}B=\dfrac{1}{\lambda}(\dfrac{BA}{\lambda}+\dfrac{BABA}{\lambda^2}+\cdot\cdot\cdot)B=\dfrac{B}{\lambda}(\dfrac{AB}{\lambda}+\dfrac{ABAB}{\lambda^2}+\cdot\cdot\cdot)=B(\lambda I+AB)^{-1}
2.
( a )
假设该最优化问题的解中,存在 ξ j < 0 \xi_j<0 ,因为
y ( j ) ( w T x ( j ) + b ) 1 ξ j 1 y^{(j)}(w^Tx^{(j)}+b)\geq1-\xi_j\geq1
若固定其他参数不变,令 ξ j = 0 \xi_j=0 可以得到使目标函数更小的解,故当前解不是局部极小值,所以 ξ j 0 \xi_j\geq0 对于任意 j j 成立,因此,是否增加 ξ j 0 \xi_j\geq0 的约束不影响问题的解。
( b )
L ( w , b , ξ , α ) = 1 2 w 2 + C 2 i = 1 m ξ i 2 i = 1 m α i ( y ( i ) ( w T x ( i ) + b ) 1 + ξ i ) α i 0 , i = 1 , 2 , . . , m L(w,b,\xi,\alpha)=\dfrac{1}{2}\left\| w\right \|^2+\dfrac{C}{2}\sum_{i=1}^{m}\xi_i^{2}-\sum_{i=1}^{m}\alpha_i(y^{(i)}(w^Tx^{(i)}+b)-1+\xi_i )\\ \alpha_i\geq0,i=1,2,..,m
( c )
w L = w i = 1 m α i y ( i ) x ( i ) = 0 b = i = 1 m α i y ( i ) = 0 ξ = C ξ α = 0 \nabla_wL=w-\sum_{i=1}^{m}\alpha_iy^{(i)}x^{(i)}=0\\ \nabla_b=\sum_{i=1}^{m}\alpha_iy^{(i)}=0\\ \nabla_\xi=C\xi-\alpha=0
( d )
根据( c )
w = i = 1 m α i y ( i ) x ( i ) i = 1 m α i y ( i ) = 0 c ξ i = α i w=\sum_{i=1}^{m}\alpha_iy^{(i)}x^{(i)}\\ \sum_{i=1}^{m}\alpha_iy^{(i)}=0\\ c\xi_i=\alpha_i L ( w , b , ξ , α ) = 1 2 w 2 + C 2 i = 1 m ξ i 2 i = 1 m α i ( y ( i ) ( w T x ( i ) + b ) 1 + ξ i ) = 1 2 ( i = 1 m α i y ( i ) x ( i ) ) T ( i = 1 m α i y ( i ) x ( i ) ) + 1 2 C i = 1 m α i 2 i = 1 m α i y ( i ) ( j = 1 m α j y ( i ) x ( j ) ) T x ( i ) + i = 1 m α i 1 C i = 1 m α i 2 = 1 2 i = 1 m j = 1 m α i α j y ( i ) y ( j ) ( x ( i ) ) T x ( j ) 1 2 C i = 1 m α i 2 + i = 1 m α i \begin{aligned} L(w,b,\xi,\alpha)&=\dfrac{1}{2}\left\| w\right \|^2+\dfrac{C}{2}\sum_{i=1}^{m}\xi_i^{2}-\sum_{i=1}^{m}\alpha_i(y^{(i)}(w^Tx^{(i)}+b)-1+\xi_i)\\ &=\dfrac{1}{2}(\sum_{i=1}^{m}\alpha_iy^{(i)}x^{(i)})^T(\sum_{i=1}^{m}\alpha_iy^{(i)}x^{(i)})+\dfrac{1}{2C}\sum_{i=1}^{m}\alpha_i^2-\sum_{i=1}^{m}\alpha_iy^{(i)}(\sum_{j=1}^{m}\alpha_jy^{(i)}x^{(j)})^Tx^{(i)}+\sum_{i=1}^{m}\alpha_i-\dfrac{1}{C}\sum_{i=1}^m\alpha_i^2\\ &=-\dfrac{1}{2}\sum_{i=1}^{m}\sum_{j=1}^{m}\alpha_i\alpha_jy^{(i)}y^{(j)}(x^{(i)})^Tx^{(j)}-\dfrac{1}{2C}\sum_{i=1}^{m}\alpha_{i}^2+\sum_{i=1}^{m}\alpha_i \end{aligned}
原问题的对偶问题为
min w , b , ξ , α L ( w , b , ξ , α ) = 1 2 i = 1 m j = 1 m α i α j y ( i ) y ( j ) ( x ( i ) ) T x ( j ) 1 2 C i = 1 m α i 2 + i = 1 m α i s . t . i = 1 m α i y ( i ) = 0 α i 0 , i = 1 , 2 , . . , m \min_{w,b,\xi,\alpha}L(w,b,\xi,\alpha)=-\dfrac{1}{2}\sum_{i=1}^{m}\sum_{j=1}^{m}\alpha_i\alpha_jy^{(i)}y^{(j)}(x^{(i)})^Tx^{(j)}-\dfrac{1}{2C}\sum_{i=1}^{m}\alpha_{i}^2+\sum_{i=1}^{m}\alpha_i\\ s.t. \sum_{i=1}^{m}\alpha_iy^{(i)}=0\\ \alpha_i\geq0,i=1,2,..,m
3.
( a )
i \forall i ,令 α i = 1 \alpha_i=1 ,令 b b =0,则 k \forall k
f ( x ( k ) ) y ( k ) = i = 1 , i k m y ( i ) e x ( i ) x ( k ) τ 2 i = 1 , i k m e x ( i ) x ( k ) τ 2 ( m 1 ) e ϵ τ 2 \begin{aligned} \left|f(x^{(k)})-y^{(k)} \right|=\left| \sum_{i=1,i\neq k}^{m}y^{(i)}e^{-\frac{\left\| x^{(i)} - x^{(k)} \right\|}{\tau^2}} \right| \leq\sum_{i=1,i\neq k}^{m}\left| e^{-\frac{\left\| x^{(i)} - x^{(k)} \right\|}{\tau^2}} \right| \leq(m-1)e^{-\frac{\epsilon}{\tau^2}} \end{aligned}
只要取 τ < ϵ l o g ( m 1 ) \tau\lt \sqrt{\dfrac{\epsilon}{log(m-1)}}
就有
f ( x ( k ) ) y ( k ) ( m 1 ) e ϵ τ 2 < 1 \left|f(x^{(k)})-y^{(k)} \right| \leq(m-1)e^{-\frac{\epsilon}{\tau^2}}\lt 1
( b )问题有误,原意应该是问没有松弛变量时能否达到零误差
可以,( a )中已经找到了达成零误差的解,由于SVM损失函数是凸函数,故必定可以找到那个解。
( c )
不一定,我们可以通过减小C来使得优化过程更偏向得到间隔更大但带有一定训练误差的解

猜你喜欢

转载自blog.csdn.net/luo3300612/article/details/82732902