ICLR2020 | how to determine the two neural networks learned is consistent

 

AI will top ICLR 2020 will be held April 26 in Addis Ababa, Ethiopia. In 2594 the final papers submitted, 687 have been received, receiving 26.5%. This article describes a receiving paper sheets fist stone team of Shanghai Jiaotong University - "Knowledge Consistency between Neural Networks and Beyond". In this paper, researchers have proposed a method for the evaluation and interpretation of the neural network characteristic expression of consistency, reliability, knowledge blind spot.

Papers link: https: //arxiv.org/pdf/1908.01581.pdf

Overview

Depth neural network (DNN) has been manifested in many tasks in a powerful ability, but there is still a lack of mathematical tools to diagnose where the layer characterization capabilities, such as found in the characterization of defects or identify reliable / unreliable features. Since the data leakage or data set changes, the traditional DNN evaluation method based on the accuracy of the test can not be in-depth assessment of the correctness of DNN representation.

Therefore, in this paper, the researchers from the Shanghai Jiaotong University proposed a method of consistency in terms of knowledge to diagnose DNN middle network characterization capabilities. That is, given the same task two training DNN (regardless of whether or not both the same architecture), the goal is to test whether two DNN intermediate layer coding similar visual concepts.

The study achieved: (1) define and quantify the consistency of the different orders of knowledge between the expression of neural network; (2) knowledge of the strength of neural networks to analyze middle; (3) the diagnosis of middle-level features, without increasing under the premise of training samples marked further promote neural network classification accuracy; (4) to explain the neural network compression and distillation of knowledge provides a new way of thinking.

Introduction to Algorithms

The paper defines the agreement between the two neural networks in knowledge representation level, that analyze whether two independent trained neural network modeling the same or similar knowledge. Researchers concerned similarity of the two neural network modeling knowledge, rather than similarity (for example, the middle order of a convolution neural network disrupted characteristics, and accordingly the corresponding upper rearranged sequential convolution kernel, wherein the upper layer after the convolution with the original characteristics corresponding to the same neural networks, case, two neural networks having different characteristics middle, but in fact the same knowledge modeling).

On the other hand, can take advantage of the neural network knowledge representation agreement, direct expression of the internal features of the neural network reliability evaluation, without the need for additional labeling new monitoring information, this has nothing to do with the evaluation criteria also set specific tasks. If there is no reliable mathematical tools to the evaluation of the reliability characteristics of neural networks, only to final evaluation by the neural network classification accuracy, the future development of the depth of learning is not enough.

Thus, for the same task of training a number of different neural networks, this study to quantify the expression of knowledge among the neural network coincide with each other, and split the features of its corresponding components. Specifically, f_A f_B and wherein the neural network represent the middle A and B of the neural network, when f_A f_B obtained by a linear transformation, may be considered and consistent f_A f_B zero order; f_A when obtained by a nonlinear transformation f_B , and can be considered f_A f_B a consistent order; similarly, can be obtained when f_A f_B nonlinear transformation by n times, and may be considered f_A f_B same order n.

As shown below, may be characterized in middle f_A neural network by the neural network is resolved different 0-K-order coherence characteristic component, and component characteristics inconsistent.

 

Consistency low order component features often represent a relatively reliable, and component noise signal inconsistency is represented by a neural network.

At the application level, the consistency of knowledge can be used to discover the neural network is not reliable characteristics and knowledge blind spot. The high-performance network as a deep knowledge representation standards, knowledge to analyze the expression of a relatively shallow defect diagnosis of neural networks (shallow neural network has its own specific value, such as used in the mobile end). When the use of shallow neural network (DNN A) to reconstruct wherein DNN (DNN B) wherein, the characteristic component inconsistent deep neural network (δ = f_B-g (f_A)) often represent the superficial knowledge of neural networks blind spot; correspondingly, when characterized using DNN to reconstruct the neural network feature when shallow, superficial inconsistency characteristic component of the neural network (δ = f_A-g (f_B)) often represent characteristics wherein no reliable components.

Experimental results

The following figure shows the predicted knowledge shallow blind algorithm and the neural network is unreliable features.

 

The following table from the perspective of knowledge consistency, stability analysis of neural network training. When relatively few training samples, the training of the neural network has a shallow greater stability.

 

As shown below, wherein the same components often represent more reliable information, the classification accuracy can be further improved neural network. That is, without increasing the premise of training samples marked down by Knowledge Consistency further enhance classification accuracy of the model.

 

Knowledge Consistency algorithm can eliminate redundant features in the neural network. Pre-trained neural networks (such as the use ImageNet trained neural network) is often modeled flood of information classification categories, feature when the target application only for a small amount of categories, features pre-trained in the expression of unrelated categories may be regarded as redundant components information. , The knowledge consensus algorithm can effectively remove a target below the application regardless of the characteristic component redundancy, further improve performance of the target application.

 

In addition, the Knowledge Consistency algorithm can analyze the model of consistency / inconsistency features of different tasks to obtain training. As shown below, A researcher trained network 320 for fine classification class (class 200 includes a CUB Bird Stanford Dog Dog class 120), train the network B simple binary (bird or dog), characterized by each Reconstruction can be seen more network a model knowledge, characteristics of the network a can be better reconstructed feature network B.

 

Knowledge Consistency algorithm can be used to analyze the loss of information network compression. After compression researchers used to reconstruct the feature model generation characteristics of the original model, often inconsistent characteristic component corresponding to the compression process knowledge to be discarded. The figure below (left), by the quantization of this part is discarded, they found that in the compression process smaller loss of knowledge will have a higher classification accuracy.

In addition, the distillation can also be interpreted by the Knowledge Consistency algorithm. Below (right), characterized by quantizing components of different generations of reproduction inconsistencies neural networks can be found with increasing distillation algebra, unreliable characteristic component gradually decreases.

 

Published 33 original articles · won praise 0 · Views 3277

Guess you like

Origin blog.csdn.net/xixiaoyaoww/article/details/104548750