Convolutional Neural Network

1. Convolutional Neural Network 中的各个 layer 提取什么样的特征？

Visualizing and Understanding Convolutional Networks:
The projections from each layer show the hierarchical nature of the features in the network. Layer 2 responds to corners and other edge/color conjunctions. Layer 3 has more complex invariances, capturing similar textures (e.g. mesh patterns (Row 1, Col 1); text (R2, C4)). Layer 4 shows significant variation, but is more class-specific: dog faces (R1, C1); bird’s legs (R4, C2). Layer 5 shows entire objects with significant pose variation, e.g. keyboards (R1, C1) and dogs (R4).

2. Reducing Overfitting ？

ImageNet Classification with Deep Convolutional Neural Networks:
Below, we describe the two primary ways in which we combat overfitting.
1 Data Augmentation

2 Dropout

3. top-1 and top-5 ？

ImageNet Classification with Deep Convolutional Neural Networks:
On ImageNet, it is customary to report two error rates: top-1 and top-5, where the top-5 error rate is the fraction of test images for which the correct label is not among the five labels considered most probable by the model. (在 ImageNet 上，按照惯例报告两个错误率：top-1 和 top-5，top-5 错误率是指测试图像的正确标签不在模型认为的五个最可能的标签之中。)

4. The depth and breadth of Convolutional neural networks (CNNs) ？

ImageNet Classification with Deep Convolutional Neural Networks:
Convolutional neural networks (CNNs) constitute one such class of models [16, 11, 13, 18, 15, 22, 26]. Their capacity can be controlled by varying their depth and breadth, and they also make strong and mostly correct assumptions about the nature of images (namely, stationarity of statistics and locality of pixel dependencies). (卷积神经网络 (CNNs) 构成了一个这样的模型 [16, 11, 13, 18, 15, 22, 26]。它们的能力可以通过改变它们的广度和深度来控制，它们也可以对图像的本质进行强大且通常正确的假设 (也就是说，统计的稳定性和像素依赖的局部性)。)

Our final network contains five convolutional and three fully-connected layers, and this depth seems to be important: we found that removing any convolutional layer (each of which contains no more than 1% of the model’s parameters) resulted in inferior performance. (我们最终的网络包含 5 个卷积层和 3 个全连接层，深度似乎是非常重要的：我们发现移除任何卷积层 (每个卷积层包含的参数不超过模型参数的 1%) 都会导致更差的性能。)

5. Overlapping Pooling VS Non-overlapping Pooling ？

ImageNet Classification with Deep Convolutional Neural Networks:
Pooling layers in CNNs summarize the outputs of neighboring groups of neurons in the same kernel map. Traditionally, the neighborhoods summarized by adjacent pooling units do not overlap (e.g., [17, 11, 4]). To be more precise, a pooling layer can be thought of as consisting of a grid of pooling units spaced $s$ pixels apart, each summarizing a neighborhood of size $z × z$ centered at the location of the pooling unit. If we set $s = z$ , we obtain traditional local pooling as commonly employed in CNNs. If we set $s < z$ , we obtain overlapping pooling. This is what we use throughout our network, with $s = 2$ and $z = 3$ . This scheme reduces the top-1 and top-5 error rates by 0.4% and 0.3%, respectively, as compared with the non-overlapping scheme $s = 2, z = 2$ , which produces output of equivalent dimensions. We generally observe during training that models with overlapping pooling find it slightly more difficult to overfit. (CNN 中的池化层归纳了同一核映射上相邻组神经元的输出。通常情况下上，相邻池化单元归纳的区域是不重叠的 (例如 [17, 11, 4])。更确切的说，池化层可看作由池化单元网格组成，网格间距为 $s$ 个像素，每个网格归纳池化单元中心位置 $z × z$ 大小的邻域。如果设置 $s = z$ ，我们会得到通常在 CNN 中采用的传统局部池化。如果设置 $s < z$ ，我们会得到重叠池化。这就是我们网络中使用的方法，设置 $s = 2$ and $z = 3$ 。与非重叠方案 $s = 2, z = 2$ 相比，这个方案分别降低了 top-1 0.4%，top-5 0.3% 的错误率，两者输出的维度是相等的。我们在训练过程中通常观察采用重叠池化的模型，发现它更难过拟合。)