STATS385

Lecture 02: Theories of Deep Learning

感知机 $\to$ 单层神经网络 $\to$ 多层神经网络/多层感知机
反向传播
卷积神经网络：LeNet, AlexNet, ReLU, max pooling, dropout（来自CS231n）
正则化实际上在人类的视觉机制中有所体现，提取图像特征同时保证稀疏性就会导致正则化；
一些值得思考的问题：
Is there an implicit sparsity-promotion in training network?
How would classification results change if replace learned filters in first layer with analytically defined wavelets, e.g.
Gabors?
Filters in the first layer are spatially localized, oriented and bandpass. What properties do filters in remaining layers satisfy?
Can we derive mathematically?
VGG；
ResNet:
Standard architectures: increasingly abstract features at each layer
ResNet: a group of successive layers iteratively refine an estimated representation [Klaus Greff et. al ’17]
Could we formulate a cost function that is being minimized in these successive layers?
What is the relation between this cost function and standard architectures?
Linear separation
Besides increasing depth, one can increase width of each layer to improve performance [Zagoruyko and Komodakis 17’]
Is there a reason for increasing depth over width or vice versa?
Is having many filters in same layer somehow detrimental?
Is having many layers not beneficial after some point?
Inputs are not linearly separable but their deepest representations are
What happens during forward pass that makes linear separation possible?
Is separation happening gradually with depth or abruptly at a certain point?
Transfer learning
Filters learned in first layers of a network are transferable from one task to another
When solving another problem, no need to retrain the lower layers, just fine tune upper ones
Is this simply due to the large amount of images in ImageNet?
Does solving many classification problems simultaneously result in features that are more easily transferable?
Does this imply filters can be learned in unsupervised manner?
Can we characterize filters mathematically?
Adversarial examples
Small but malicious perturbations can result in severe misclassification
Malicious examples generalize across different architectures
What is source of instability?
Can we robustify network?
Geometry of images
Activation maximization seeks input image maximizing activation of certain neuron
Could we span all images that excite a certain neuron?
What geometrical structure would these images create?

这堂课中提出的以上这些问题都值得仔细思考和研究

Lecture 02: Theories of Deep Learning

STATS385

Lecture 02: Theories of Deep Learning

猜你喜欢