Lecture 02: Theories of Deep Learning

STATS385

Lecture 02: Theories of Deep Learning

  • Perceptron single layer neural network Multilayer Neural Network/Multilayer Perceptron
  • backpropagation
  • Convolutional Neural Networks: LeNet, AlexNet, ReLU, max pooling, dropout (from CS231n)
  • Regularization is actually reflected in the human visual mechanism, and extracting image features while ensuring sparsity will lead to regularization;
  • 一些值得思考的问题:
    Is there an implicit sparsity-promotion in training network?
    How would classification results change if replace learned filters in first layer with analytically defined wavelets, e.g.
    Gabors?
    Filters in the first layer are spatially localized, oriented and bandpass. What properties do filters in remaining layers satisfy?
    Can we derive mathematically?
  • VGG;
  • ResNet:
    Standard architectures: increasingly abstract features at each layer
    ResNet: a group of successive layers iteratively refine an estimated representation [Klaus Greff et. al ’17]
    Could we formulate a cost function that is being minimized in these successive layers?
    What is the relation between this cost function and standard architectures?
  • Linear separation
    Besides increasing depth, one can increase width of each layer to improve performance [Zagoruyko and Komodakis 17’]
    Is there a reason for increasing depth over width or vice versa?
    Is having many filters in same layer somehow detrimental?
    Is having many layers not beneficial after some point?
    Inputs are not linearly separable but their deepest representations are
    What happens during forward pass that makes linear separation possible?
    Is separation happening gradually with depth or abruptly at a certain point?
  • Transfer learning
    Filters learned in first layers of a network are transferable from one task to another
    When solving another problem, no need to retrain the lower layers, just fine tune upper ones
    Is this simply due to the large amount of images in ImageNet?
    Does solving many classification problems simultaneously result in features that are more easily transferable?
    Does this imply filters can be learned in unsupervised manner?
    Can we characterize filters mathematically?
  • Adversarial examples
    Small but malicious perturbations can result in severe misclassification
    Malicious examples generalize across different architectures
    What is source of instability?
    Can we robustify network?
  • Geometry of images
    Activation maximization seeks input image maximizing activation of certain neuron
    Could we span all images that excite a certain neuron?
    What geometrical structure would these images create?

The above questions raised in this class are worthy of careful consideration and study

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324694570&siteId=291194637