AI evolves again, reconstructs brain activity, and reproduces what you are thinking

53a7b655ed499e559a15c0ab55703079.jpeg

Article source: Xinzhiyuan

[Introduction] Recently, a study claims to be able to use Stable Diffusion to reconstruct brain activity into high-resolution, high-precision images. Relevant papers were accepted by CVPR 2023, which caused an uproar among netizens. AI reading brain is already close at hand?

Even without Hogwarts magic, you can see what other people are thinking!

The method is very simple, based on Stable Diffusion to visualize brain images.

For example, the bears, airplanes, and trains you see are like this.

d8b3e4a869e19fb50e57228b42bd1c0a.png

When the AI ​​sees the brain signal, the generated image looks like the following, which shows that all the essential points are there.

84700adb32f7fc828394ec15736af03a.gif

This AI brain-reading technique has just been accepted by CVPR 2023, making circle friends instantly "cranial orgasm".

c91fd0c9b39115788ce8dd832778adf6.png

So wild! Forget about hinting projects, now you just need to use your brain to "think" those pictures.

8b1b1d022a421ef8e9c2b7ca9b80d2db.png

Imagine using Stable Diffusion to reconstruct visual images from fMRI data, which may mean that it may develop into a non-invasive brain-computer interface in the future.

Let AI directly skip human language and perceive what is thought in the human brain.

cf56524a49f90af34ef080b8be209891.gif

At that time, Musk's Neuralink will also catch up with this AI ceiling.

Use AI to directly reproduce what you are thinking without fine-tuning


So, how to realize AI reading brain?

The latest research comes from a research team at Osaka University in Japan.

e7d8651b9634dd4ddfa6030db5aef727.png

Paper address: https://sites.google.com/view/stablediffusion-with-brain/

Researchers at the Graduate School of Frontier Biosciences at Osaka University and CiNet at NICT in Japan based on the Latent Diffusion Model (LDM), more specifically, reconstructed visual experience from fMRI data through Stable Diffusion.

The framework of the entire operation process is also very simple: 1 image encoder, 1 image decoder, and 1 semantic decoder.

5c8fbefe4e5ba376c11c9a9a771b1316.png

By doing so, the team eliminated the need to train and fine-tune complex AI models.

All that needs to be trained is a simple linear model that maps fMRI signals from inferior and superior visual brain areas to a single stable diffusion component.

Specifically, the researchers mapped brain regions as input to image and text encoders. Lower brain regions were mapped to image encoders and upper brain regions to text encoders. Doing so allows the system to use image composition and semantic content for reconstruction.

The first is decoding analysis. The LDM model used in the study consists of an image encoder ε, an image decoder D, and a text encoder τ.

The latent representations of the reconstructed image z and the associated text c were decoded from the fMRI signals of the early and advanced visual cortex, respectively, and used as input, and the reproduced image Xzc was generated by an autoencoder.

f422ae4b3e465387bfd116a4fcd87dca.png

Next, the researchers built an encoding model to predict the fMRI signals from different components of the LDM to explore the inner workings of the LDM.

b23c6b7a1a21a5ed0203acc042490a27.png

The researchers conducted experiments using fMRI images from the Natural Scene Dataset (NSD) and tested whether they could use Stable Diffusion to reconstruct what subjects saw.

It can be seen that the encoding model correlates with the LDM latent image prediction accuracy, with the last model producing the highest prediction accuracy in the posterior visual cortex.

3fa99214394708df8549d6d08936c173.png

Visual reconstructions of one subject showed that the reconstructed image using only z was visually consistent with the original image, but failed to capture semantic content.

Whereas images reconstructed with only c have better semantic fidelity but poorer visual consistency, images reconstructed with zc can have both high semantic fidelity and high resolution.

228bb49516b59dacd46ec2bee1406424.png

Reconstruction results of the same image from all subjects showed that the effect of reconstruction was stable and relatively accurate across different subjects.

The differences in specific details may come from different individuals' perception experiences or differences in data quality, rather than errors in the reconstruction process.

41d2e2f2fb1b7bc4af3a3dbb5df6c52c.png

Finally, the results of the quantitative assessment are plotted in a graph.

Various results show that the methods employed in the study can capture not only the low-level visual appearance, but also the high-level semantic content of the original stimuli.

dc6a19466a999dcfc9f538bc74c4478d.png

From this, experiments show that the combination of image and text decoding provides accurate reconstructions.

There were differences in accuracy between subjects, but these differences correlated with the quality of the fMRI images, the researchers said. According to the team, the quality of the reconstruction is comparable to current state-of-the-art methods, but without the need to train the AI ​​models used in them.

At the same time, the team also used models derived from fMRI data to study the various building blocks of stable diffusion, such as how semantic content is generated during reverse diffusion, or what process occurs in U-Net.

U-Net's bottleneck layers (orange) yield the highest predictive performance early in the denoising process, as the denoising process progresses, early layers (blue) make predictions about early visual cortex activity, and the bottleneck layer shifts to higher visual cortex.

That is, at the beginning of the diffusion process, the image information is compressed in the bottleneck layers, and with denoising, the separation between U-Net layers occurs in the visual cortex.

e7a605189919c6005e58853576d3c94b.png

In addition, the team is working on a quantitative interpretation of image transitions at different stages of diffusion. In this way, the researchers aim to contribute to a better understanding of diffusion models from a biological perspective, which are widely used but still limited in their understanding.

The human brain picture has already been decoded by AI?

For years, researchers have been using artificial intelligence models to decode information from the human brain.

At the core of most approaches, pre-recorded fMRI images are used as input to generative AI models of text or images.

9872b3a9a994062263f399042ffe4efb.png

For example, in early 2018, a group of researchers from Japan showed how a neural network could reconstruct images from fMRI recordings.

In 2019, a group reconstructed images from neurons in monkeys, and Meta's research group, led by Jean-Remi King, published new work such as deriving text from fMRI data.

bc97a26e6a51031d55726b35a78c27f6.gif

In October 2022, a team at the University of Texas at Austin showed that a GPT model can infer text from fMRI scans that describes the semantic content a person sees in a video.

In November 2022, researchers at the National University of Singapore, the Chinese University of Hong Kong, and Stanford University used the MinD-Vis diffusion model to reconstruct images from fMRI scans with significantly higher accuracy than methods available at the time.

8f09035b9a7787cdb53bd13e6c625cfa.png

If you push back further, some netizens pointed out that "images generated from brain waves have been around since at least 2008. It is simply ridiculous to imply that Stable Diffusion can read people's minds in some way."

The paper, published in Nature by the University of California, Berkeley, says a person's brain wave activity can be translated into images using a visual decoder.

cd33c3f411954b7f0afb2808e2af0397.png

b2a53006ac9db0fdcd3e5742de14db98.png

To talk about going back to history, some people directly took out a 1999 study by Stanford Li Feifei on reconstructing images from the cerebral cortex.

565341db5d590d951d0a03443d56a940.png

Li Feifei also commented and forwarded it, saying that he was still a university intern at that time.

5651da7137a566030c67379f0948dccc.png

Also in 2011, a UC Berkeley study used functional magnetic resonance imaging (fMRI) and computational models to initially reconstruct a "dynamic visual image" of the brain.

deb75753ecf8d820439187691ea5bd4a.png

That is, they recreate clips that people have seen.

But compared to the latest research, this reconstruction is not at all "high-definition", almost unrecognizable.

about the author

Yu Takagi

Yu Takagi is an assistant professor at Osaka University. His research interests are at the intersection of computational neuroscience and artificial intelligence.

During his Ph.D., he studied techniques for predicting individual differences from whole-brain functional connectivity using functional magnetic resonance imaging (fMRI) in the Brain Communication Research Laboratory of ATR.

Most recently, he has used machine learning techniques to understand dynamic computation in complex decision-making tasks at the Oxford Center for Human Brain Activity at the University of Oxford and at the Department of Psychology at the University of Tokyo.

95553de5b5fbefe6027ed865de93b5d7.png

Shinji Nishimoto

Shinji Nishimoto is a professor at Osaka University. Aspect of his research is the quantitative understanding of visual and cognitive processing in the brain.

8710727e8ce715fc6358dcb078612e82.jpeg

More specifically, the research of Prof. Nishimoto's group focuses on understanding neural processing and representation by building predictive models of brain activity evoked under natural perceptual and cognitive conditions.

9bfeb0677667dff7e8d7cbd210d979f8.jpeg

Some netizens asked the author, can this research be used to interpret dreams?

"It is possible to apply the same technique to brain activity during sleep, but the accuracy of such an application is currently unknown."

6264e676373e16bfc822d5f73c50b85e.png

After reading this research: Legilimency is fully established.

fff3b863f58043341ac622dbe39203d9.gif

References:

https://sites.google.com/view/stablediffusion-with-brain/

https://www.biorxiv.org/content/10.1101/2022.11.18.517004v2

3dbab2d86fc23891b40eb5de73d4d6c4.jpeg

8b5d54a91832eaf50a104645595bee9f.png

Guess you like

Origin blog.csdn.net/CGforYou/article/details/129395125