NeurIPS 2019 | Learn how to imitate the CNN of the mouse visual system

(Author | Yang Xiaofan transferred from AI Technology Review)

The current CNN-based image recognition model can indeed achieve high recognition accuracy in many tasks and has also been used in many practical applications. However, the generalization and robustness of the CNN model is still far inferior to that of human vision. After slightly modified or noisy images, human visual recognition is almost unaffected, while the recognition accuracy of CNN may fluctuate greatly; scenes and perspectives can also significantly affect the performance of CNN, not to mention from a few samples Learning object recognition.

If the highest level of "visual intelligence" is the invariable neural representation and the generalization ability of the image after complex transformations that can still be recognized and processed, the human/biological visual system obviously has visual intelligence, and modern CNN does not Not available; studying the source of visual intelligence in biological visual systems and trying to reproduce it in artificially designed visual systems has been a hot research topic in the field of neuroscience and machine learning in recent years.

Recently, the NeurIPS 2019 paper "Learning From Brains How to Regularize Machines" has made an interesting attempt on this issue. Earlier, we introduced the results of Japanese researchers, who reproduced the images seen by the eyes from the fMRI imaging decoding of the human brain. But according to the neural activity of the brain directly affects the process of artificial neural network learning and representation, thereby affecting the performance of the model in the classification task, this method is quite novel and interesting.

Original paper: https://arxiv.org/abs/1911.05072

5001.jpg

1. Try again to imitate the biological vision system

In this paper, the authors focus on comparing one aspect of visual intelligence: robustness to confrontational attacks and noise interference. Adversarial attacks can make small modifications to a given image to allow the CNN model to recognize objects as another category with a high degree of confidence. Noise generally affects the recognition accuracy of CNN. Facing the same interference, the performance of the biological vision system is almost unaffected. This is likely to indicate that not only does CNN lack the advanced scene understanding capabilities in biological vision systems, but the visual features used by CNN to recognize objects may also be completely different from biological vision systems.

Unfortunately, the working mechanisms of biological neural networks and artificial neural networks are very different. Even if we can decode different levels of visual features from the biological visual system, it is difficult for us to directly copy them into the artificial neural network. Some researchers have done direct research on biological visual systems. For example, New York University professor Eero Simoncelli has done research and experiments from the perspective of texture and hierarchical perception ( he introduced in the ICLR 2017 special lecture ), but he There is no way to improve the CNN directly.

However, if we want to imitate the characteristics of biological neural networks, we may not be completely helpless: we will introduce various implicit induction biases when training neural networks, and use different regularization methods, which can increase the parameter space of the network. Restrictions and guiding changes in the way the model learns and uses features will ultimately affect the robustness and generalization of the model. Although it is difficult for us to clarify the influence of various selection biases in advance in neural network training at this stage, and the patterns learned by the model often cannot be generalized outside the range of training data, this is at least a promising breakthrough.

One thing that helps from the side is that many recent studies have shown that the perceptual representation of the CNN network trained in the task is similar to the representation signal measured in the primate brain. The authors boldly hypothesized that if the representation of artificial neural network and the representation reflected by the stimulated neural activity of the biological visual system can be made more similar, it may have a positive impact on the performance of CNN-for example, the same as the biological visual system It also has a more stable performance in the face of noisy and changing images.

2. Learn the mouse's visual system

The authors of this paper introduce additional biases and regularize (guide) the model so that the model learned by the model is more similar to the model of the biological vision system. Specifically, the authors directly measured the neural response of the visual cortex in the brain when the mice saw various complex natural scenes. Then, when training CNN, the authors no longer use the traditional training target that only takes recognition as the core. Instead, they encourage the activation pattern of convolutional features to be closer to the pattern of biological neural signals, which is to make CNN learn closer. Characterization of biological vision system.

Measuring mice

The authors repeated many rounds of scans of the main visual cortex on multiple mice over a period of several days. In the experiment, the head of the mouse was fixed and the body could run on the track. The researchers extracted 5100 different images from the ImageNet data set and converted them to grayscale colors, with the resolution reduced to 64x36, and then showed them to mice (mice's visual ability is not as strong as humans, and they are not sensitive to colors); Among them, 5000 sheets are displayed only once (measured once), and 100 sheets are displayed 10 times (measured 10 times), so that there are 6000 sets of measurement data for each round of experiment for each mouse.

The main reason for choosing mice as experimental animals is that the genetic modification methods for mice are more mature, allowing the authors to measure the signals of 8,000 different neural units at the same time. Of course, the visual systems of other primates that are complex enough and close enough to humans are ideal research objects, but it is difficult after all, and the visual system is still an important sensory input pathway for mice and still has measurement significance.

The authors calculated the signal-to-noise ratio based on the measured signal, averaged the signals collected in the repeated experiments, and designed a noise reduction model based on the image-measurement signal pair, and finally effectively reduced the noise of the original measurement data. Obtain reliable neural signal characteristics for subsequent experiments. Then, the authors established a 5000x5000 similarity matrix on the denoised data and used it as the target of the regularized CNN network.

5002.jpg

Similarity matrix: The dots in the figure represent the level of detail of the neural signals generated by the image corresponding to the abscissa and the image corresponding to the ordinate. The light-colored dots indicate a higher degree of similarity, which means they are viewed by the biological vision system Are two more similar images; darker dots indicate less similarity

CNN training

The authors chose an 18-layer ResNet model as the backbone of the CNN model to imitate the mouse, and let it imitate the activation mode of the mouse visual system while learning the image classification task.

5003.jpg

In a typical CNN classification task training, the loss to be optimized is only one task loss, such as cross-entropy loss. In order for CNN to imitate mice, the authors added a similarity loss. The specific working mode is:

• The network can receive one or two images as input

• If the input is a single image, the model will output the category classification result through an additional layer of fully connected layer-calculate the cross entropy loss

• If the input is two images (a set of), the model will calculate the convolutional features of the activation of the two images, and then calculate the similarity of the activation features in the 1, 5, 9, 13, and 17 layers. The authors calculate a final similarity result through a regularized weighted adder, and then compare it with the signal similarity measured on the mouse to calculate the similarity loss, thereby guiding the network to learn the neural signal pattern of the mouse.

In this way, the authors use regularization to allow the original CNN model to learn visual representations that are more similar to the biological visual system.

3. Experimental results

5004.jpg

In the result comparison part, the authors first compared the "Biological Vision CNN" (based on ResNet18) with similarity loss training with several normal training models, and tested them on CIFAR images converted to grayscale. ResNet18, which has no additional regularization at all, achieved the highest recognition accuracy rate for images without added noise, but as noise is added, the accuracy rate drops rapidly; under the highest noise in the test, the accuracy rate of "Biological Vision CNN" is still 50%, much higher than other models; that is to say, the robustness of the model has been significantly improved.

As a control experiment, the authors also tested different levels of "biological vision", such as using a random similarity matrix for regularization, and using the conv3-1 layer similarity matrix of VGG19 for regularization. The performance is still not as good as "biological vision". Visual CNN", but the layer using VGG19 is the most similar.

In another set of control experiments, the authors used the mouse neural signal measurement data without noise reduction to directly regularize, and found that the robustness of the model improved very little; the authors believe that the reason is that the original neural signal has a high degree of variation/random This shows the importance of noise reduction steps.

In the adversarial image recognition test that includes a variety of adversarial attack methods, the performance of "biological vision CNN" is also much better than other models.

4. Conclusion and discussion

We often say that neuroscience can inspire machine learning, but we have always lacked a way to directly transform the research results of neurophysiology into artificial neural networks. In this paper, the authors demonstrated a regularization method using neural signal measurement data, which can add bias to the neural network model, and make the representation learned by the model and the visual representation of the mouse brain (biological visual system) better. To be similar, improve the robustness of the network in reasoning tasks. The authors believe that if the similarity between the visual representation of the artificial neural network and the visual cortex above V1 in the biological visual system can be further improved in the future, the robustness and generalization performance of the model can also continue to improve. These representations learned from the actual imitation of the biological brain may be able to help the performance of the machine learning algorithm to be closer to the performance of the human visual system.

Regarding the method, the representation similarity method used by the authors is a more general method. Their original signal denoising method also helps to improve the evaluation of biological visual representations (decoupling visually related signals and decoupling the separately appearing The neural signal is converted into a reliable, noise-reduced neural signal). Another way to make the CNN model imitate the biological neural representation is to jointly train a linear reader from the middle layer of the network when training the CNN model that performs the task, and let it directly predict the biological neural response from the image features. However, the authors believe that the method they choose has higher restrictions and guidance strength, because a large number of profiling transformations in CNN can be compensated by linear readers, which can improve the accuracy of predicting neural response while learning from CNN. The impact of the resulting representation is very small.

In this paper, the authors mainly measured the robustness of "biological vision CNN", but obviously, the long-term goal of research in this direction is to use visual representations that imitate biological vision, in domain migration, small sample learning, etc. Bring improvements in all aspects. The authors will also do more explorations in other manifestations of the visual system and more similarity limitations in the future.

In addition, although method design and experimental results show that imitating biological visual features can bring improvements, it is still a problem to be explored which aspect of biological visual representation has been specifically learned. This is also the most worthy of in-depth exploration behind this research. If you can clarify the specific impact principles, you can no longer rely on large-scale neural signal acquisition experiments, but directly design and train machine learning models based on these principles-this is the highest goal of this research route.

Related Reading:

Interpretation! A collection of 8 NeurIPS 2019 papers, including Beijing Post, Xidian, and DeepMind

Guess you like

Origin blog.csdn.net/AMiner2006/article/details/103235024