Multimodal Neurons in Artificial Neural Networks

Recommended: Add NSDT Scene Editor to your 3D toolchain
3D Toolset: NSDT Jane Stone Digital Twins

  In 2005, a letter published in the journal Nature described how human neurons respond to specific people, such as Jennifer Aniston or Halle Berry. The exciting thing is not only that they choose for a particular person, but that they do so regardless of whether they see a photo, drawing, or even an image of that person's name. Neurons are multimodal. As the lead author puts it: "You're seeing the far end of the shift from metrics, visual shapes, to concepts...information.

  We report the existence of similar multimodal neurons in artificial neural networks. This includes neurons selecting prominent public or fictional characters, such as Lady Gaga or Spider-Man. Like biological multimodal neurons, these artificial neurons respond to the same subject in photos, drawings and images:

  Biological Neuron Clip Neuron Pre-Artificial Neuron Probing by Depth Electrode Neuron 4 Neuron 483 of the Penultimate Layer of CLIP RN244_50x Universal Person Detector from Inception v1 Halle Berry Spiderman Face

Respond to photos of Halle Berry and Halle Berry in costume ✓

⟳ See more responses to photos of Spiderman in costume and spider ✓

Respond to people's faces ✓

Response to Halle Berry's Funny ✓

⟳ See more comics or drawings that respond to Spider-Man and Spider-themed icons ✓

No obvious response to face drawing ✕

Response text "Halle Berry" ✓

⟳ See more response texts "Spider" and others ✓

  No appreciable response to text ✕ Realistic images Concept map Text images Note that images are replaced by higher resolution surrogates from Quiroga et al., which are themselves surrogates for the original stimuli.

  Human detection neurons are only scratching the surface of the highly abstract neurons we discovered. Some neurons looked like topics outside of the kindergarten curriculum: weather, seasons, letters, counting, or primary colors. All of these features, even seemingly trivial ones, are richly multimodal, e.g. yellow neurons are used for images of “yellow”, “banana” and “lemon” in addition to color.

  We identified these multimodal neurons in the recent CLIP model, although there may be similar undiscovered multimodal neurons in earlier models. The CLIP model consists of two sides, a ResNet vision model and a Transformer language model, trained to align image and text pairs from the internet using a contrastive loss. There are several CLIP models of different sizes; we found multimodal neurons in all of these models, but focused on the medium-sized RN50-x4 model. We refer readers to the CLIP blog posts and papers that discuss CLIP's architecture and performance in more detail. Our analysis will focus on the visual aspect of CLIP, so when we talk about multimodal neurons that respond to text, we mean that the model "reads" the text in the image.

  The abstract visual features of CLIP can be viewed as a natural consequence of visual and textual alignment. We want word embeddings (and language models in general) to learn abstract "topic" features. Either the side of the model that deals with subtitles (the "linguistic side") needs to drop these features, or its counterpart, the "visual side", needs to build a visual simulation. But even though these features seem natural in retrospect, they are qualitatively different (for example) from neurons previously studied in vision models. They also have real-world implications: the models are vulnerable to a type of "typographic attack," in which adding adversarial text to images can cause them to be systematically misclassified.

iPod 95.5% Weasel 0.5% Remote 0.4% Hamster 0.4% Mongoose 0.2% Meerkat 0.1%

typographical attack.


Neuron Family Guided Tour

  What functions exist in the CLIP model? In this section, we examine the neurons found in the final convolutional layers on the vision side in the four models. Most of these neurons appear to be interpretable. 9 Each layer consists of thousands of neurons, so for our initial analysis we looked at feature visualizations, examples of datasets that most activate neurons, and English word. This reveals an incredible diversity of functions, an example of which we share below:

  These neurons are not selected just for a single object. They also (weaker) trigger associated stimuli, such as Barack Obama neurons firing for Michelle Obama, or morning neurons firing for the breakfast image. They also tend to be maximally inhibited by stimuli that can be seen as their opposites in a very abstract way.

How should we think about these neurons? From an interpretability perspective, these neurons can be viewed as extreme examples of "faceted neurons" that respond to multiple different situations. Looking at neuroscience, they may sound like "grandmother neurons," 12 but their associative nature distinguishes them from the way many neuroscientists interpret the term. The term "concept neuron" is sometimes used to describe neurons with Biological neurons of similar nature, but this framing may encourage people to overinterpret these artificial neurons. Instead, the authors generally assume that these neurons resemble visual versions of topic features, activating features we might expect to be similar in word embeddings.

[13][14][15]

Many of these neurons deal with sensitive topics, from politicians to emotions. Some neurons explicitly represented or were strongly associated with protected traits: age, gender, race, religion, sexual orientation, having a disability and mental health conditions, pregnancy and parental status. 14 These neurons may reflect bias in the 'related' stimuli they respond to, or be used downstream to achieve biased behavior. There are also a handful of person detectors, for individuals involved in crimes against humanity, and a "toxic" neuron that responds to hate speech and sexual content. Having neurons corresponding to sensitive topics does not necessarily mean that the network will be biased. You can even imagine explicit representations that would help in some cases: Toxic neurons might help a model match hateful images with captions that refute them. But they are warning signs of various possible biases, and studying them may help us find underlying biases that we might pay less attention to. 15

CLIP contains a large number of interesting neurons. For a detailed examination, we will focus on the three "neuron families" shown above: human neurons, emotional neurons, and regional neurons. We invite you to explore other products in Microscopy.

human neuron

  This section discusses neurons representing present and historical figures. Our discussion aims to describe and candidly illustrate what the model has learned from the internet data it was trained on, not to endorse the associations it makes or the figures in question, which include politicians and people who have committed crimes against humanity. This content may be disturbing to some readers.

  In order to caption images on the internet, humans rely on cultural knowledge. If you try to caption popular images from exotic locations, you'll quickly discover that your object and scene recognition skills just aren't enough. You can't caption a photo at a stadium without knowing the sport, and you might even need to know a specific player to get the right caption. Photos of politicians and celebrities speaking, some of the most popular images on the internet, are much harder to caption if you don't know who's speaking and what they're talking about. Some public figures elicited strong reactions that could affect online discussions and subtitles, regardless of other content.

  With that in mind, perhaps it's no surprise that the model puts a lot of prowess at representing specific public and historical figures—especially those that are emotional or provocative. The Jesus Christ neuron detects Christian symbols such as the cross and the crown of thorns, paintings of Jesus, his written name, and feature visualizations showing him as a baby in the arms of the Virgin Mary. Spider-Man's neurons recognized the masked hero and learned his secret identity, Peter Parker. It also echoes the images, words and drawings of the heroes and villains of the past half century of Spider-Man films and comics. Hitler neurons learned to detect his face and body, symbols of the Nazi Party, relevant historical documents, and other loosely related concepts such as German food. The feature visualization shows swastikas and Hitler appears to be giving a Nazi salute.

Trademark Nature Poses for Any Texting Face Architecture Interiors

Jesus Any Texting Face Trademark Nature Poses for Architectural Interiors

hitler

Case Study: The Donald Trump Neuron

  Which people the model develops specialized neurons for is random but seems to correlate with that person's prevalence in the dataset and how strongly people respond to them. One person we found in every CLIP model was Donald Trump. It responds strongly to images of him in a variety of contexts, including portraits and caricatures in many artistic mediums, as well as a weaker one to those who worked closely with him, such as Mike Pence and Steve Bannon. activation. It also echoes his political symbols and messages (eg. "Wall" and "Make America Great Again" hats). On the other hand, it's most *negatively* activated to musicians like Nicky Minaj and Eminem, video games like Fortnite, civil rights activists like Martin Luther King Jr., and LGBT symbols like the rainbow flag.

  To better understand this neuron, we use human markers to estimate conditional probabilities for several classes of images at different activation levels.

Figure 2:  To gain a deeper understanding of the Trump neuron, we collected around 650 images that caused it to fire in varying amounts, and manually labeled them with classes we created. This allows us to estimate the conditional probability of a label at a given activation level. See the appendix for details. Since the Black/LGBT category contains only a small number of images, as they are infrequently present in the dataset, we verified through further experiments that they cause negative activations when we plot for Figure 17. Across all categories, we see that the higher activation of Trump neurons is highly selective, as more than 90% of images with a standard deviation greater than 30 are related to Donald Trump.

  When labeling images for previous experiments, it became apparent that neurons fired different amounts for specific people. We can study this by searching the internet for pictures of specific people and measuring how each person's image fires neurons.

  Nelson MandelaMartin Luther KingJeff BezosSteve JobsHitlerObamaTed CruzHilary ClintonSteve BannonMike PenceDonald TrumpNelson Mann Dela Martin Luther King Jr. Jeff Bezos Steve Jobs Hitler Obama Ted Cruz Hillary Clinton Steve Bannon Mike Pence Donald Trump Standard Deviation with Zero Activation -15-10-5051015202530354045 Figure 3:  To see how Trump neurons respond to different individuals, we searched Google Images for the query "X speaking into a microphone" for different individuals. We manually cleaned the data to exclude photos that were not clear photos of an individual's face. The length of the bar for each person shows the median activation for that person's photo in the standard deviation of the neurons on the dataset, and the range on the bar shows the standard deviation of the activation for that person's photo.

Presumably, human neurons are also present in other models, such as facial recognition models. These neurons are unique in that they respond to people across modalities and associations, placing them in cultural context. In particular, we were struck by how neuronal responses tracked informal intuitions with people. In this sense, human neurons can be thought of as a landscape of human associations, of which the human being is simply the highest peak.

emotional neurons

  This section discusses neurons that represent emotions, as well as neurons that represent "mental illness." Our discussion is intended to be descriptive and candid about what a model has learned from the internet data it was trained on, not an endorsement. This content may be disturbing to some readers.

  Since a small change in someone's expression can fundamentally change the meaning of a picture, emotional content is crucial to the captioning task. The model dedicates dozens of neurons to this task, each representing a different emotion.

  These emotional neurons don't just respond to emotion-related facial expressions—they're flexible, responding to body language and facial expressions in humans and animals, drawings and text. For example, neurons that we think of as happiness neurons respond to a smile as well as words like "happy." Surprise neurons fired even when most of the face was occluded. It responds to slang words like "OMG!" and "WTF," and text feature visualization produces similar words of shock and surprise. There are even emotional neurons that respond to scenes that evoke an emotional "ambience," such as creative neurons that respond to an art studio. Of course, these neurons were only responding to cues related to emotion and did not necessarily correspond to the mental state of the subject in the image.

Trademark Nature Poses for Any Texting Face Architecture Interiors

surprise/shock

  In addition to these emotional neurons, we also found which neurons respond to emotion as a secondary player, but primarily respond to something else. As we'll see in later chapters, neurons that primarily respond to prison and incarceration help represent emotions such as "persecuted." Likewise, neurons that primarily detect pornography appear to have a secondary function representing arousal. The neurons that respond most strongly to question marks help signal "curiosity."

Trademark Nature Poses for Any Texting Face Architecture Interiors

Incarcerate any texting face Trademark nature poses for architectural interiors

question mark

Figure 4: Emotional neurons respond to various stimuli: facial expressions, body language, words, etc. We can use faceted feature visualization to see some of these different facets. In particular, facets show facial expressions corresponding to different emotions, such as smiling, crying, or wide-eyed shock. Click on any neuron to open it in the microscope to view more information, including a sample dataset.

  While most emotion neurons seem pretty abstract, there are also neurons that simply respond to specific body and facial expressions, such as the silly expression neurons. It best evokes the internet-born duck face and peace sign, both of which we'll see later in the biggest corresponding headlines.

Trademark Nature Poses for Any Texting Face Architecture Interiors

stupid look

Case Study: Neurons in Psychiatric Illness

  A neuron that does not represent a single emotion but a high-level category of mental states is what we conceptualize as a "mental illness" neuron. This neuron fires when images contain words associated with negative mental states (e.g. "depression," "anxiety," "loneliness," "stress"), words associated with clinical mental health treatments ("psychology ", "mental," "disorder," "therapy") or derogatory terms for mental health ("madness," "psychology"). It was also weaker for images of drugs, facial expressions that looked sad or stressed, and names of negative emotions.

Trademark Nature Poses for Any Texting Face Architecture Interiors

mental illness

  Often, we don't think of mental illness as a dimension of emotion. However, several things make this neuron important in emotional contexts. First, in its middle and low range activations, it represents common negative emotions such as sadness. Second, words like "depression" are often used colloquially to describe nonclinical conditions. Finally, as we'll see in later chapters, this neuron plays an important role in dubbing emotions, grouped with other emotion neurons to distinguish between "healthy" and "unhealthy" versions of emotions.

  To gain a better understanding of this neuron, we again estimated the conditional probabilities of the various classes by activation magnitude. The strongest positive activations were for concepts related to mental illness. Instead, the strongest negative activations corresponded to activities such as exercise, sports, and musical activities.

Figure 5: To gain a deeper understanding of the 'mental illness neuron', we collected images that caused it to fire in varying numbers and manually labeled them with classes we created. This allows us to estimate the conditional probability of a label at a given activation level. See the appendix for details. During the labeling process, we cannot see the extent to which it fires the neuron. We see that the strongest activations all belong to the label corresponding to the low-price mental state. On the other hand, many images with negative pre-ReLU activations are what we usually consider expensive scenes, such as pet photos, travel photos and/or sports event photos.

regional neurons

  This section discusses neurons representing regions of the world, and indirectly races. The model's representations are learned from the Internet and may reflect biases and stereotypes, sensitive regional situations, and colonialism. Our discussion is intended to be descriptive and candid about what a model has learned from the Internet data it was trained on, rather than an endorsement of model representations or associations. This content may be disturbing to some readers.

  From local weather and food, to travel and immigration, to language and ethnicity: geography is an important implicit or explicit context in a great deal of online discourse. The blizzard is more likely to be discussed in Canada. Vegemite is more likely to be found in Australia. Discussions about China are more likely to be in Chinese.

We found that the CLIP model develops regional neurons   that respond to geographic regions . These neurons might be viewed as visual analogs of geographic information in word embeddings. They respond to various forms and aspects associated with a particular place: country and city names, architecture, prominent public figures, faces of the most common ethnicities, distinctive clothing, wild animals, and local scripts (if not the Roman alphabet). If a world map is shown, these neurons fire selectively for relevant regions on the map, even without labels.

  Regional neurons vary widely in size, from neurons corresponding to entire hemispheres—for example, northern hemisphere neurons responding to bears, moose, taiga, and the entire northern third of the world map—to national ones. A subregion, such as the West Coast of the United States. Which regions the models dedicate neurons to appears to be random and varies across the models we examined.

  Not all regional neurons fire on a global-scale map. In particular, neurons encoding for smaller countries or regions (eg. New York, Israel/Palestine) may not. This means that visualizing behavior on a global map underestimates the absolute number of regional neurons present in CLIP. Using the most active English words as a heuristic, we estimate that about 4% of neurons are regional.

  In addition to pure regional neurons, we found that many other neurons appeared to be "minor regional". These neurons do not have regions as their main focus, but have some kind of geographic information, and they are weak for the regions on the world map they are associated with. For example, an entrepreneurial neuron firing for California or a cold neuron firing for the North Pole. Other neurons linked concepts to regions of the world in seemingly American-centric or even racist ways: an immigration neuron responding to Latin America, a terrorism neuron responding to the Middle East.

Case Study: African Neurons

  Despite the examples of these neurons learning American-centric comics, in some domains the model appears to be a bit more nuanced than feared, especially given that CLIP was only trained on English-language data. For example, instead of blurring all of Africa into one overall entity, the RN50-x4 model develops neurons for three regions of Africa. This is nowhere near as detailed as its representation of many Western countries, which sometimes have individual country or even national subregional neurons, but it still strikes us.

  RN50-4x has multiple Afro-neurons. Coloring the countries by name activates to suggest them to choose a different region. central? South East

  In earlier explorations, it quickly became apparent that these neurons "know more" about Africa than the authors. For example, one of the first feature visualizations of regional neurons in South Africa draws the text "Imbewu", which we learn is a South African TV series.

  We selected East African neurons for a more careful study, again using conditional probability maps. Its firepower is strongest against flags, country names, and other powerful national associations. Surprisingly, the distribution of activations of moderate intensity -- the more common neurons -- was markedly different and appeared to be mainly related to race. Perhaps this is because race is implicit in all human figures, providing weak evidence for a region, whereas features like flags are far less frequent but provide strong evidence when they do occur. This is the first neuron we have scrutinized with a clear regime change between moderate and strong activation.

Figure 7:  We labeled over 400 images that caused the neurons most responsive to the word "Ghana" to fire at different activation levels without having access to the extent to which each image caused neurons to fire when labeled. See the appendix for details. It fires most against people of African descent, as well as African words such as country names. Pre-ReLU activation is negative for symbols associated with other countries (such as the Tesla logo or the Union Jack flag) and non-African descent. Many of its strongest negative activations are for weapons like military vehicles and pistols. Ghana, the country name it responds most strongly to, with a Global Peace Index rating higher than most African countries, may have learned this counter-association.

  We also investigated the activation of two other Afro-neurons. We suspect they have interesting differences beyond detecting different country names and flags—otherwise why the model would dedicate three neurons—but we lack the cultural knowledge to appreciate their subtleties.

Functional properties

  So far, we have studied specific neurons to understand the types of features present in the CLIP model. It's worth noting that several properties may be missing when discussing individual functions:

  Image-based word embeddings: Although a visual model, one can use the visual CLIP model to produce "image-based word embeddings" by rasterizing words into images, feeding those images into the model, and subtracting the words average of. Like ordinary word embeddings, the nearest neighbors of words tend to be semantically related. 32 Word arithmetic, eg:
V(Img( "King")) - V(Img("Man")) + V(Img("Woman")) = V(Img( "Queen"  ))
in some cases , if we mask non-semantic lexicon neurons (eg. "-ing" detectors). Mixed arithmetic with words and images seems like it should be possible.

  Limited Multilingual Behavior: Although CLIP's training data is filtered to English, many features exhibit limited multilingual responsiveness. For example, "positive" neurons responded to images of "Thank you" in English, "Merci" in French, "Danke" in German, and "Gracias" in Spanish, as well as "Congratulations" in English, "Gratulieren" in German, "Felicidades" in Spanish, and Indonesian Image of "Selamat". As the Indonesian example shows, the model can recognize some words from non-Romanse/Germanic languages. However, we were unable to find any examples of models that map words in non-Latin alphabets to semantic meanings. It can recognize many scripts (Arabic, Chinese, Japanese, etc.), and activate the corresponding regional neurons, but can't seem to map words in those scripts to their meanings.

  Bias: Certain types of bias appear to be embedded in these representations, similar to classical bias in word embeddings (for example). Perhaps the most striking example is racial and religious prejudice. As we mentioned when discussing area neurons, there seems to be a "terrorist/Islamic" neuron that responds to images of words like "terrorism", "attack", "terror", "scared", as well as "Islamic" , "Allah", "Muslim" and other words. This isn’t just an illusion of looking at individual neurons: the image-based word embeddings for “terrorist” have a cosine similarity of 0.52 to “Muslim,” the highest value we’ve observed for words that don’t include “terror.” 34 Similarly, the “illegal immigration” neuron is selected for Latin American countries. (We'll see more examples of bias in the next section.

  Multiple semantics and Siamese neurons: Our qualitative experience is that single neurons are easier to interpret than random orientations; this mirrors observations made in previous work. While we focused on neurons that seemed to have a well-defined concept to which they responded, many CLIP neurons were "polysemantic", responding to multiple unrelated features. Unusually, polysemantic neurons in CLIP often have suspicious connections between the different concepts they respond to. For example, we observed Phil adelphia/Philipines/ Philip ip neurons, Christmas as / As s neurons and Actor / Velocirap tor neurons. Concepts in these neurons appear to be "siamese", overlapping in a superficial way in one dimension, and then generalizing in multiple directions. We have not ruled out the possibility that these are mere coincidences, as there are substantial aspects of each concept that may overlap. However, if conjoined features do exist, they suggest new potential explanations for polysemanticity.

Original address: Multimodal neurons in biological neurons (mvrlink.com)

Guess you like

Origin blog.csdn.net/ygtu2018/article/details/131639859