Learning Algorithm Summary

Target

w ^ = a r g m i n w i = 1 n L ( w , z i ) , | | w | | 1 s

w ^ = a r g m i n w i = 1 n L ( w , z i ) + λ | | w | | 1

Gradient Descent

W ( t + 1 ) = W ( t ) η ( t ) G ( t ) = W ( t ) η ( t ) ζ W ( W ( t ) , Z )

Stochastic Gradient Descent

W ( t + 1 ) = W ( t ) η ( t ) G j ( t ) = W ( t ) η ( t ) ζ W ( W ( t ) , Z j )

Momentum

m t = μ m t 1 + G ( t )

W ( t + 1 ) = W ( t ) η ( t ) m t

Nesterov

扫描二维码关注公众号,回复: 2332826 查看本文章
m t = μ m t 1 + G ( t )

W ( t + 1 ) = W ( t ) η μ m t 1 η G ( t )

Adagrad

n t = n t 1 + ( G ( t ) ) 2

W ( t + 1 ) = W ( t ) η n t + ϵ G ( t )

Adadelta

n t = ν n t 1 + ( 1 ν ) ( G ( t ) ) 2

W ( t + 1 ) = W ( t ) η n t + ϵ G ( t )

With L1 Regulization

W ( t + 1 ) = W ( t ) η ( t ) G ( t ) η ( t ) λ s g n ( W ( t ) )

Simple Truncated

T 0 ( v , θ ) = { 0     i f | v | θ v     o t h e r w i s e

W ( t + 1 ) = T 0 ( W ( t ) η ( t ) G ( t ) , θ )

Truncated Gradient

T 1 ( v , α , θ ) = { m a x ( 0 , v α ) i f   v [ 0 , θ ] m i n ( 0 , v + α ) i f   v [ θ , 0 ] v o t h e r w i s e

W ( t + 1 ) = T 1 ( W ( t ) η ( t ) G ( t ) , η ( t ) λ ( t ) , θ )

Adam

m t = μ m t 1 + ( 1 μ ) G ( t )

n t = ν n t 1 + ( 1 ν ) ( G ( t ) ) 2

m ^ t = m t 1 μ t

n ^ t = n t 1 ν t

W ( t + 1 ) = W ( t ) m ^ t n ^ t + ϵ η

FOBOS

W ( t + 0.5 ) = W ( t ) η ( t ) G ( t )

W ( t + 1 ) = a r g m i n w { 1 2 | | W W ( t + 0.5 ) | | 2 2 + η ( t + 0.5 ) Ψ ( W ) }

RDA

W ( t + 1 ) = a r g m i n w { 1 t r = 1 t G ( r ) W + Ψ ( W ) + β ( t ) t h ( w ) }

FTRL

W ( t + 1 ) = a r g m i n w { G ( 1 : t ) W + λ 1 | | W | | 1 + λ 2 2 + 1 2 s = 1 t | | W W ( s ) | | 2 2 }

猜你喜欢

转载自blog.csdn.net/gaofeipaopaotang/article/details/81147064