New research Hinton et al: how best to measure the neural network representing the similarity

May 22, 2019 08:39:15 like soy sauce veterans number 177 Read more

Category Column: Artificial Intelligence

https://www.toutiao.com/a6692998683081835012/

Recently, many studies have attempted to understand the behavior of the neural network representation by comparing neural network. A new study Google brain Simon Kornblith, Geoffrey Hinton, who introduces centered kernel alignment (CKA) as the similarity index, and analyze the relationship between CKA, linear regression, canonical correlation analysis (CCA) and other relevant methods, proof CKA better than other similar index.

In many tasks in machine learning, neural networks can automatically learn the depth of the powerful feature representation from the data. Despite the depth of the neural network has made impressive progress in a variety of tasks, but how to understand and describe the neural network learning from data to indicate not been adequately studied. Previous work (for example, Advani & Saxe (2017), Amari et al. (2018), Saxe et al. (2013)) in theoretical understanding of the dynamic neural network training process made some progress. These studies, although very deep, but there is basic limitations because they ignore the complex interactions between the dynamic process of training and structured data. In fact, the neural network said it would provide more information about the interaction between machine learning algorithms and data than the loss function.

This paper studies the Google brain to measure the depth of the neural network representing the similarity of problems. It represents an effective way to measure similarity can help answer many interesting questions, including: (1) the depth of the neural network based on the same architecture different random initialization get training to learn whether similar expressions? (2) the possibility of establishing correspondence between the various layers of different neural network architectures? (3) the same neural network architecture from different data representations focus on learning how similar?

The main contribution of this paper are:

Invariance are discussed and their impact similarity index indicates the measurement of neural networks similarity.
Introduced centered kernel alignment (CKA) as a similarity index, and analyze the relationship between the CKA, linear regression, canonical correlation analysis (CCA) and other related methods.
CKA proved capable of determining based on the correspondence relation between the hidden layer neural networks of different random initialization and training of different widths, the similarity index previously proposed in these scenarios are not applicable.
Verify wider network to learn the more similar layers of the front and rear layers similarity easier than saturated. This study demonstrates the neural network instead of the previous layers from different layers of data to focus on learning to similar expressions.

Problem Description

Order X∈R ^ (n × p_1) represents the neuron activation p_1 n matrix samples, Y∈R ^ (n × p_2) p_2 neuron represents the same n samples of activated matrix. Each column of the matrices is assumed that the pre-matrix have been zero mean. In the case of loss of generality, we assume p_1≤p_2.

In order to visualize and understand the effects of different factors depth study, researchers designed and analyzed scalar similarity index s (X, Y), which can be used to represent the similarity between neural network and the neural network comparator.

论文：Similarity of Neural Network Representations Revisited

New research Hinton et al: how best to measure the neural network representing the similarity

Papers address: https: //arxiv.org/pdf/1905.00414.pdf

Some recent work trying to understand the behavior of neural networks by representing between layers and compare between different training models. The paper studies based on canonical correlation analysis (CCA) Comparative represented by neural networks, multivariate proof CCA belonging similarity measurement for a class of statistical methods, but reversible CCA and other invariant linear transformation includes statistical methods are unable to measure the high-dimensional similarity between the number of data points in the representation.

This study describes a similarity index, which can measure represents the relationship between the similarity matrix, and not so limited above. The similarity index is equivalent to the centered kernel alignment (CKA), and is also closely related to the CCA. Unlike CCA, CKA can be reliably identified based on the correspondence relationship between the representation of the different initialization training learning obtained.

Invariance similarity index is for the transformation of what terms?

Invariance similarity index shows the influence of its similarity to measure the neural network is very important. The study suggests that the dynamic process of the intuitive notion of similarity and neural network training are required similarity index of the isotropic orthogonal transformation and scaling (isotropic scaling) is constant, and not reversible linear transformation.

Relatively similar structure (Similarity Structure)

Direct comparison with a sample characterized in that two multi-variable representation (such as by regression) are different, the main point of the study is: firstly were measured similarity between each representation of each sample, and then a similar comparison structure. In neuroscience, it represents the similarity between the sample matrix is called characterized similarity matrix (Kriegeskorte et al., 2008a). Hereinafter proved that, if the measure of similarity to the product used, the characterization of similarity between the similarity matrix can be simplified to the intuitive notion of another pair of similar characteristics.

Based on the similarity of the dot product. The following is a simple formula of the product of correlation between the dot product and the point between the sample feature:

New research Hinton et al: how best to measure the neural network representing the similarity

Hilbert-Schmidt independence standards (HSIC). 1 can be deduced from the equation for the mean X and Y 0 are:

New research Hinton et al: how best to measure the neural network representing the similarity

So K_ij = k (x_i, x_j), L_ij = l (y_i, y_j), where k and l are the two kernels. HSIC experience estimate is:

New research Hinton et al: how best to measure the neural network representing the similarity

Centered Kernel Alignment. HSIC not have an isotropic scaling invariance, but can be normalized to have invariance. Indicators after normalization called centered kernel alignment (Cortes et al, 2012; Cristianini et al, 2002..):

New research Hinton et al: how best to measure the neural network representing the similarity

Relevant similarity index

In the context of the similarity between the neural network represents a measurement, a brief review of researchers linear regression, canonical correlation and other related methods. Table 1 summarizes the formula used in the experiment, indicators and their invariance.

New research Hinton et al: how best to measure the neural network representing the similarity

Table 1: Summary of various similarity metrics.

Q_X 和 Q_Y 分别是 X 和 Y 的正交基。U_X 和 U_Y 分别是对 X 和 Y 做奇异值分解后按奇异值降序排列的左奇异向量。|| · ||∗ 表示核范数。T_X 和 T_Y 是截断单位矩阵，利用 T_X 和 T_Y 选出的 X 和 Y 的左奇异向量可以使累积方差达到某个阈值。

线性回归。线性回归是关联神经网络表示的一种简单方法。它可以将 Y 中的每个特征拟合为 X 中特征的线性组合。一个合适的汇总统计量是该拟合所能解释的方差比例：

New research Hinton et al: how best to measure the neural network representing the similarity

典型相关分析（CCA）。典型相关分析是求两个矩阵的基，使得当原矩阵被投影到这些基上时，相关性最大。对于 1≤i≤p_1，第 i 个典型相关系数ρ_i 由下式给出：

New research Hinton et al: how best to measure the neural network representing the similarity

SVCCA.当 X 或 Y 的条件数很大时，CCA 对扰动敏感。为了提高鲁棒性，奇异向量 CCA 方法 (singular vector CCA, SVCCA) 对 X 和 Y 的奇异值分解截断后使用 CCA。

Projection-Weighted CCA. Morcos 等人 (2018) 提出了一种不同的策略来降低 CCA 对扰动的敏感性，他们称这种方法为「投影加权典型相关分析」（PWCCA）：

New research Hinton et al: how best to measure the neural network representing the similarity

结论

该研究首先研究了一种基于 All-CNN-C 的类 VGG 卷积网络。图 2 和表 2 说明只有 CKA 方法通过了完整性检查，而其他方法则表现得很差。

New research Hinton et al: how best to measure the neural network representing the similarity

图 2：CKA 揭示了基于不同随机初始化训练的 CNN 的不同层之间的一致关系，而 CCA、线性回归和 SVCCA 则无法揭示这种关系。

New research Hinton et al: how best to measure the neural network representing the similarity

表 2：不同方法基于最大相似性识别基于不同初始化训练的 10 个结构相同的 10 层 CNN 网络中对应层的准确率，其中排除了 logits 层。

CKA 可以揭示神经网络表示中的异常。图 3 展示了具有不同深度的卷积网络各层之间的 CKA，其中卷积网络的各层分别重复 2、4 和 8 次。从中可以看出，深度加倍可以提高准确率，但是深度增加太多反而会降低准确率。

New research Hinton et al: how best to measure the neural network representing the similarity

Figure 3: CKA reveal abnormal depth is too deep neural networks said they would appear. Top: Linear CKA a depth between different layers of the network trained on the CIFAR-10. The title of each panel shows the accuracy of each network. After 8 times the depth of layers of the neural network is similar to the last layer. Below: agreement on the layers of the same neural network trained logistic regression classifier accuracy and CKA.

CKA may also be used to compare different data sets on a trained network. Figure 7 shows a similar representation generated in layers at its front CIFAR-100 model trained on and CIFAR-10. These expressed the need for training, and they represent a lot of similarities to the low generated between the untrained network representation.

New research Hinton et al: how best to measure the neural network representing the similarity

FIG 7: CKA shows a similar representation produced in the model (CIFAR-10 and CIFAR-100) different from the training data set, which represents a significant difference generated by the model represented by untrained. The left panel shows the similarity in CIFAR-10 test set between different models of the same layer, while the right column shows similarity in CIFAR-100 test set. CKA is the mean of 10 of each type of model (45 pairs).

From visual standpoint, RSM method is more useful than the summary statistics CKA, since the method does not RSM similarity summarized to a single number, but rather to provide a more complete measurement result information CKA. Figure 8 illustrates, for large feature vectors, XX ^ T and YY ^ T similar to the role, but so that the rank of the establishment of subspace dimensions far below the activation function.

New research Hinton et al: how best to measure the neural network representing the similarity

Figure 8: 10 based on the shared subspace two layer neural network training random initialization mainly by the maximum eigenvector corresponding to the eigenvectors. Each row represents a different network layers. The average cell layer 64 only neurons.

New research Hinton et al: how best to measure the neural network representing the similarity

New research Hinton et al: how best to measure the neural network representing the similarity

Guess you like