"Concept Whitening", a new technology that provides the interpretability of neural networks

image

Author: Ben Dickson (software engineers, TechTalks founder)

Translator: hhhnoone

原文:Deep learning doesn’t need to be a black box

The success of deep neural networks is due to their extremely large and complex parameter networks, but this complexity has also led to some drawbacks: the inner workings of neural networks are usually a mystery-even for their creators. . Since deep learning became popular in the early 2010s, this problem has continued to plague the artificial intelligence community.

With the application and expansion of deep learning in different fields, people are more and more interested in technologies that can interpret neural networks (internal working principles) by verifying neural network results and learning parameters.

Recently, a paper published by Nature Machine Intelligence introduced a promising new method. Scientists at Duke University have proposed a technique called "concept whitening" that can help guide neural networks to learn specific concepts without sacrificing performance. Concept whitening brings interpretability to deep learning models, instead of searching for answers among millions of trained parameters, showing encouraging results.

image

Features and hidden space of deep learning models

If there are sufficiently high-quality training examples, a deep learning model with a reasonable architecture should be able to distinguish different types of inputs. For example, in a computer vision task, a trained neural network will be able to convert the pixel value of an image to its corresponding category. The concept whitening is proposed in the context of image recognition.

During the training process, each layer of the deep learning model encodes the characteristics of the training image into a set of values ​​and stores them in its parameters. This is called the latent space of the AI ​​model. Generally speaking, the lower layers of the multi-layer convolutional neural network will learn basic features, such as corners and edges, and the higher-level neural networks will learn to detect more complex features, such as faces, objects, complete scenes, etc.

image

Figure: Each layer of the neural network encodes the specific characteristics of the input image

Ideally, the latent space of a neural network represents some concepts that are related to the image categories to be detected by the neural network, but we usually cannot be clearly aware of this, and deep learning models tend to learn the most recognizable features. Even these characteristics are wrong.

For example, the following data set contains some images that contain kittens and there is exactly a logo in the lower right corner. A person can easily think that the logo has nothing to do with the target and ignore it, but the deep learning model may find that a logo in the lower right corner is the simplest and most effective way to distinguish cats from other animals. Similarly, if all sheep images in your training set contain large green pastures, your neural network may learn to detect green pastures instead of sheep.

image

Figure: During training, the machine learning algorithm searches for the most distinguishable features that associate pixels with labels.

Therefore, in addition to the performance of the deep learning model on the training and test data sets, it is also important to understand which concepts and features it has learned to detect. This is where the classic interpretation techniques come into play.

Neural network attribution

Many deep learning techniques are post hoc, that is, trying to make a trained neural network meaningful by checking its output and parameter values. For example, a common technique is to mask different parts of the input image to determine what the neural network sees in the image (which part or feature of the image is understood by the neural network) and observe how these changes affect the output of the deep learning model , This technology helps to create a heat map (heap map), which can be used to highlight image features that are more relevant to neural networks.

image

Figure: Example of feature map

Other hindsight techniques include turning on and off different artificial neurons and examining how these changes affect the output of the AI ​​model. These methods help to discover the relationship between features and hidden space. Although these methods are helpful, they still treat deep learning models as black boxes and cannot clearly describe how neural networks work.

Existing interpretation methods are usually summary statistics of performance (for example, local approximation, general trend of node activation), rather than actual interpretation of model calculations", the author of the concept whitening paper wrote. For example, the problem with feature maps is They often fail to show the wrong content that the neural network may have learned. When the characteristics of the neural network are scattered in the hidden space, it becomes very difficult to explain the role of a single neuron.

image

Figure: The feature map explanation does not accurately describe how the black box AI model works.

Zhi Chen, a doctoral student in computer science at Duke University and the first author of the concept whitening paper, said: "Deep neural networks (DNNs) are very powerful in the field of image recognition, but due to their complexity, what you learn in the hidden layers of DNNs is unknown. The lack of interpretability makes the neural network untrustworthy and difficult to troubleshoot.” Many previous works have tried to explain what the neural network model has learned, such as the concepts learned by each neuron, but These works rely heavily on the assumption that these concepts are actually learned by the neural network (but not actually) and concentrated on one neuron.

Cynthia Rudin (Cynthia Rudin), a professor of computer science at Duke University, is the co-author of the concept whitening paper. She has previously warned of the dangers of believing in black-box interpretation techniques and demonstrated that this approach may provide false explanations for neural networks. In another paper previously published in the journal Nature Machine Intelligence, Rudin encouraged the use and development of AI models with inherent interpretability.

The concept whitening proposed this time aims to align the hidden space of the neural network with some concepts, and these concepts are the goal of the neural network. This method will make the deep learning model interpretable, and it will also make it easier for us to find the relationship between the characteristics of the input image and the output of the neural network. Rudin said: "Our work directly changes the neural network to decouple the hidden space and align the axes with known concepts."

image

Deep learning models are usually trained on a single labeled example data set. concept whitening introduces another data set that contains conceptual examples. These concepts are related to the main tasks of the AI ​​model. For example, if your deep learning model is mainly used to detect bedrooms, related concepts will include beds, lights, windows, doors, etc.

“Representative samples can be selected manually because they may constitute our definition of interpretability,” Chen said. “Machine learning practitioners can collect these samples in any way to create conceptual data sets suitable for their applications. For example, Doctors can be asked to select representative x-ray images to define medical concepts (data sets)."

Through concept whitening, the team conducted two parallel training cycles for the deep learning model. When the neural network adjusts its overall parameters to represent the categories in the main task, concept whitening adjusts the specific neurons in each layer to align these neurons with the categories contained in the concept data set.

The result is a decoupled hidden space, concepts are neatly separated in each layer, and the activation of neurons corresponds to their respective concepts. "Such decoupling allows us to understand more clearly how neural networks gradually learn concepts at different levels," Chen said (decoupling here means that different parts of the hidden space represent different concepts).

image

In order to evaluate the effectiveness of the concept whitening technology, the researchers ran a series of verification images through a deep learning model that inserted the concept whitening module at different levels. Then they classify the images according to the concept neurons activated in each layer. In the lower layer of the neural network, the concept whitenin module captures low-level features such as color and texture. For example, the lower layers of the neural network can learn that blue images containing white objects are closely related to the concept of "aircraft", while images with warm colors are more likely to contain the concept of "bed". At a higher level, the neural network learns to classify concepts.

image

Figure: Concept whitening learns low-level information (such as color and texture) at the low level, and learns high-level information (such as objects, people) at the high level.

One of the benefits of concept decomposition and alignment is that neural networks become less prone to making obvious mistakes. When the image enters the neural network, the conceptual neurons in the higher layer correct errors that may occur in the lower layer. For example, in the image below, due to the dense presence of blue and white pixels, the lower layers of the neural network mistakenly associate the image with the concept of "aircraft". But when the image moves on a higher level, the concept neuron directs the result in the right direction (as shown in the picture).

image

Figure: When the image moves from the lower layer of the neural network to the higher layer, Concept whitening can correct wrong concepts and errors.

Previous work in the AI ​​field included creating classifiers that tried to infer concepts from values ​​in the latent space of neural networks. However, according to Chen, there is no decoupled latent space. The concepts learned by these methods (without Concept whitening) are not pure, because the prediction scores of concept neurons can be correlated. "In the past, some people have tried to solve the entanglement of neural networks by means of supervised learning, but they have not really decoupled the hidden space in a way. On the other hand, Concept whitening uses whitening transformation to remove the inter-axis Associations truly decouple these concepts.

Application of Concept whitening in deep learning

Specifically, Concept whitening is a module that can be inserted into a convolutional neural network to replace the batch normalization module. Batch normalization was launched in 2015 and is a popular technology that can adjust the distribution of data used to train neural networks to speed up training and avoid artifacts such as overfitting. Most convolutional neural networks use batch normalization in each layer.

In addition to the batch normalization function, Concept whitening also aligns data along multiple axes representing related concepts.

The benefit of the Concept whitening architecture is that it can be easily integrated into many existing deep learning models. During the research process, the team modified several popular pre-trained deep learning models by replacing the batch normalization module with Concept whitening, and achieved the expected results with only one epoch training (one epoch is the time to train the complete training set When the deep learning module is trained from scratch, it usually goes through many epochs).

"CW can be applied to fields such as medical imaging, where interpretability is very important," Rudin said.

In their experiments, the researchers applied concept whitening to a deep learning model for diagnosing skin injuries. "Measuring concept importance scores on the CW latent space can provide practical insights into which concepts may be more important in the diagnosis of skin lesions," they wrote in the paper.

Chen said: "In order to further develop, we plan to not rely on predefined concepts, but to discover these concepts from the data set, especially the undiscovered, useful, and undefined concepts, and then decouple them in the neural network. These discovered concepts are clearly expressed in the hidden space of the network to better explain (the working principle of neural networks)".

For them, another direction of future research is to organize concepts in a hierarchical structure and decouple concept clusters instead of individual concepts.

Implications for deep learning research

For a long time, with the year-by-year expansion and complexity of deep learning models, there have been more and more diverse theories on how to deal with the transparency of neural networks.

One of the main arguments is whether to observe the behavior of the AI ​​model instead of trying to see how the black box works. This is the same way of studying animal and human brains, conducting experiments and recording brain activity. Proponents of this theory believe that any attempt to impose interpretable design constraints on neural networks will result in a decline in model quality. If the brain evolves through billions of iterations without an intelligent top-down design, then the neural network should also achieve its highest performance through a purely evolutionary approach.

Concept whitening refutes this theory and proves that it is possible to impose top-down design constraints on neural networks without causing any performance loss. Interestingly, experiments show that the Concept whitening module of the deep learning model can provide interpretability, and the accuracy of the task will not be significantly reduced.

Rudin said: "Concept whitening and many other work in our laboratory (and many other laboratories) clearly show that it is possible to build an interpretable model without compromising performance. We hope this work can change people The assumption that a black box is necessary for good performance, and hope that this work will attract more people to build interpretable models in their respective fields."

References:
1、https://bdtechtalks.com/2021/01/11/concept-whitening-interpretable-neural-networks/
2、https://www.nature.com/articles/s42256-020-00265-z

About data combat faction

The data practitioner hopes to use real data and industry practical cases to help readers improve their business capabilities and build an interesting big data community.

image

Guess you like

Origin blog.csdn.net/shujushizhanpai/article/details/112886843