实用机器学习笔记(八):SGD + MLP + CNN + RNN简介

1. Mini-batch stochastic gradient descnet (SGD)

Train by mini-batch SGD

  • w model param, b batch size, η t \eta_t ηt learning rate at time t
  • randomly initialize w 1 w_1 w1
  • repeat t = 1,2…until converge
    • randomly samples I t ∈ 1 , . . . , n I_t \in {1,...,n} It1,...,n with I t I_t It = b
    • update w t + 1 = w t − η t ▽ w t l ( w I t , y I t , w t ) w_{t+1} = w_t -\eta_t \triangledown_{w_t}l(w_{I_t},y_{I_t},w_t) wt+1=wtηtwtl(wIt,yIt,wt)

sensitive the hyper-parameters b and η t \eta_t ηt (SGD对偏置b和学习率非常敏感)

2. Linear Methods -> Multilayer Perception(MLP)

MLP中的一些常见名词
(1) a dense (full connected ,or linear)layer has parameters W ∈ R m × n W \in R^{m\times n} WRm×n
it computes output y = W x + b ∈ R m y = Wx + b \in R^{m} y=Wx+bRm
(2) linear regression : dense layer with 1 output
(3) softmax regrsson : dense layer with m output + softmax
(4) activation is a elemental-wise non-linear function
s i g m o i d ( x ) = 1 1 + e x p ( − x ) sigmoid(x) = \frac{1}{1+exp(-x)} sigmoid(x)=1+exp(x)1
r e l u = m a x ( x , 0 ) relu = max(x,0) relu=max(x,0)
(5) stack multiple hiddent layers (dense + activation) to get deeper models
(6) hyper-parameters : hidden layers and the outputs for each hidden layer

3. Dense Layer --> Convolution layer (CNN)

The problem of dense layer

(1) learn imageNet(300*300 image with 1k classes) by a MLP with a single hidden layer with 10k output (参数量太大

  1. it leads to 1 billion learnable parameters,which is too big!
  2. fully connected : an output is a weighted sum over all inputs

(2) recognize objectes in images:(图片识别问题,局部信息、平移问题)

  1. Translation invariance:similar output (no matter where the object is)
  2. Locality: pixels are more related to near neighbor

(3) build the prior knowledege into the model structure(在模型中加入先验信息以减少参数量

  1. achieve same model capacity with less params

Convolution layer

(1) locality : an output is computed from k*k input windows (感受野)

(2) translation invariant : output use the same k*k weights(kernel) (平移不变性

(3) model params of a convlution layer does not depend on input/output sizes(参数量独立

(4)a kernel may learn to identify a patten(一个kernel学习一个模式

"""Convolution with single input and output channels"""
# both input 'X' and weight 'K' are matrix
h,w = K.shape # the size of kernel:height and width
Y = torch.zeros((X.shape[0] - h + 1,X.shape[1] -w +1)) # the convolution result

for i in range(Y.shape[0]):
    for j in range(Y.shape[1]):
        Y[i,j] = (X[i:i+h,j:j+w]*K).sum()

Pooling Layer

(1) convolution is sensitive to location

  • a pixel shift in the input result in a pixel shift in output

  • a pooling layer computes mean/max in k*k windows

# h,w: pooling window height and width
# mode: max or avg
Y = torch.zeros((X.shape(0)-h+1,X.shape[1]-w+1))
for i in range(Y.shape[0]):
    for j in range(Y.shape[1]):
        if mode=='max':
            Y[i,j] = X[i:i+h,j:j+w].max()
        elif mode=='avg':
            Y[i,j] = X[i:i+h,j:j+w].mean()

Convolution Neural Network(CNN): 有参考文献,可以白嫖

  1. A neural network uses stack of convolution layers to extract features

    • activation is applied after each convolution layer
    • using pooling to reduce location sensiticity
  2. modern CNNs are deep neural network with various hyper-parameters and layer connections

4. Dense layer --> Recurrent network (RNN)

The problem of dense layer

  1. language model: predicte the next word

    • hello --> world
    • Hello world --> !
  2. use MLP naively does’t handle sequence infomation well

    • the input/output don’t have the same length

RNN and Gate RNN : 看参考文献

动手学深度学习(三十七)——循环神经网络

  1. simple RNN :
    h t = ϕ ( W h h h t − 1 + W h x X t + b h ) h_t = \phi(W_{hh}h_{t-1} + W_{hx}X_t + b_h) ht=ϕ(Whhht1+WhxXt+bh)
  1. Gated RNN(LSTM and GRU):finer control of information flow

5. Summary

  1. MLP : stack dense layers with non-linear activations
  2. CNN : stack convolution activation and pooling layers to efficient extract spatial information
  3. RNN : stack recurrent layers to pass temporal information throught hiddens state

猜你喜欢

转载自blog.csdn.net/jerry_liufeng/article/details/123673908
MLP