Article source: Xinzhiyuan
[Introduction] Recently, a study claims to be able to use Stable Diffusion to reconstruct brain activity into high-resolution, high-precision images. Relevant papers were accepted by CVPR 2023, which caused an uproar among netizens. AI reading brain is already close at hand?
Even without Hogwarts magic, you can see what other people are thinking!
The method is very simple, based on Stable Diffusion to visualize brain images.
For example, the bears, airplanes, and trains you see are like this.
When the AI sees the brain signal, the generated image looks like the following, which shows that all the essential points are there.
This AI brain-reading technique has just been accepted by CVPR 2023, making circle friends instantly "cranial orgasm".
So wild! Forget about hinting projects, now you just need to use your brain to "think" those pictures.
Imagine using Stable Diffusion to reconstruct visual images from fMRI data, which may mean that it may develop into a non-invasive brain-computer interface in the future.
Let AI directly skip human language and perceive what is thought in the human brain.
At that time, Musk's Neuralink will also catch up with this AI ceiling.
Use AI to directly reproduce what you are thinking without fine-tuning
So, how to realize AI reading brain?
The latest research comes from a research team at Osaka University in Japan.
Paper address: https://sites.google.com/view/stablediffusion-with-brain/
Researchers at the Graduate School of Frontier Biosciences at Osaka University and CiNet at NICT in Japan based on the Latent Diffusion Model (LDM), more specifically, reconstructed visual experience from fMRI data through Stable Diffusion.
The framework of the entire operation process is also very simple: 1 image encoder, 1 image decoder, and 1 semantic decoder.
By doing so, the team eliminated the need to train and fine-tune complex AI models.
All that needs to be trained is a simple linear model that maps fMRI signals from inferior and superior visual brain areas to a single stable diffusion component.
Specifically, the researchers mapped brain regions as input to image and text encoders. Lower brain regions were mapped to image encoders and upper brain regions to text encoders. Doing so allows the system to use image composition and semantic content for reconstruction.
The first is decoding analysis. The LDM model used in the study consists of an image encoder ε, an image decoder D, and a text encoder τ.
The latent representations of the reconstructed image z and the associated text c were decoded from the fMRI signals of the early and advanced visual cortex, respectively, and used as input, and the reproduced image Xzc was generated by an autoencoder.
Next, the researchers built an encoding model to predict the fMRI signals from different components of the LDM to explore the inner workings of the LDM.
The researchers conducted experiments using fMRI images from the Natural Scene Dataset (NSD) and tested whether they could use Stable Diffusion to reconstruct what subjects saw.
It can be seen that the encoding model correlates with the LDM latent image prediction accuracy, with the last model producing the highest prediction accuracy in the posterior visual cortex.
Visual reconstructions of one subject showed that the reconstructed image using only z was visually consistent with the original image, but failed to capture semantic content.
Whereas images reconstructed with only c have better semantic fidelity but poorer visual consistency, images reconstructed with zc can have both high semantic fidelity and high resolution.
Reconstruction results of the same image from all subjects showed that the effect of reconstruction was stable and relatively accurate across different subjects.
The differences in specific details may come from different individuals' perception experiences or differences in data quality, rather than errors in the reconstruction process.
Finally, the results of the quantitative assessment are plotted in a graph.
Various results show that the methods employed in the study can capture not only the low-level visual appearance, but also the high-level semantic content of the original stimuli.
From this, experiments show that the combination of image and text decoding provides accurate reconstructions.
There were differences in accuracy between subjects, but these differences correlated with the quality of the fMRI images, the researchers said. According to the team, the quality of the reconstruction is comparable to current state-of-the-art methods, but without the need to train the AI models used in them.
At the same time, the team also used models derived from fMRI data to study the various building blocks of stable diffusion, such as how semantic content is generated during reverse diffusion, or what process occurs in U-Net.
U-Net's bottleneck layers (orange) yield the highest predictive performance early in the denoising process, as the denoising process progresses, early layers (blue) make predictions about early visual cortex activity, and the bottleneck layer shifts to higher visual cortex.
That is, at the beginning of the diffusion process, the image information is compressed in the bottleneck layers, and with denoising, the separation between U-Net layers occurs in the visual cortex.
In addition, the team is working on a quantitative interpretation of image transitions at different stages of diffusion. In this way, the researchers aim to contribute to a better understanding of diffusion models from a biological perspective, which are widely used but still limited in their understanding.
The human brain picture has already been decoded by AI?
For years, researchers have been using artificial intelligence models to decode information from the human brain.
At the core of most approaches, pre-recorded fMRI images are used as input to generative AI models of text or images.
For example, in early 2018, a group of researchers from Japan showed how a neural network could reconstruct images from fMRI recordings.
In 2019, a group reconstructed images from neurons in monkeys, and Meta's research group, led by Jean-Remi King, published new work such as deriving text from fMRI data.
In October 2022, a team at the University of Texas at Austin showed that a GPT model can infer text from fMRI scans that describes the semantic content a person sees in a video.
In November 2022, researchers at the National University of Singapore, the Chinese University of Hong Kong, and Stanford University used the MinD-Vis diffusion model to reconstruct images from fMRI scans with significantly higher accuracy than methods available at the time.
If you push back further, some netizens pointed out that "images generated from brain waves have been around since at least 2008. It is simply ridiculous to imply that Stable Diffusion can read people's minds in some way."
The paper, published in Nature by the University of California, Berkeley, says a person's brain wave activity can be translated into images using a visual decoder.
To talk about going back to history, some people directly took out a 1999 study by Stanford Li Feifei on reconstructing images from the cerebral cortex.
Li Feifei also commented and forwarded it, saying that he was still a university intern at that time.
Also in 2011, a UC Berkeley study used functional magnetic resonance imaging (fMRI) and computational models to initially reconstruct a "dynamic visual image" of the brain.
That is, they recreate clips that people have seen.
But compared to the latest research, this reconstruction is not at all "high-definition", almost unrecognizable.
about the author
Yu Takagi
Yu Takagi is an assistant professor at Osaka University. His research interests are at the intersection of computational neuroscience and artificial intelligence.
During his Ph.D., he studied techniques for predicting individual differences from whole-brain functional connectivity using functional magnetic resonance imaging (fMRI) in the Brain Communication Research Laboratory of ATR.
Most recently, he has used machine learning techniques to understand dynamic computation in complex decision-making tasks at the Oxford Center for Human Brain Activity at the University of Oxford and at the Department of Psychology at the University of Tokyo.
Shinji Nishimoto
Shinji Nishimoto is a professor at Osaka University. Aspect of his research is the quantitative understanding of visual and cognitive processing in the brain.
More specifically, the research of Prof. Nishimoto's group focuses on understanding neural processing and representation by building predictive models of brain activity evoked under natural perceptual and cognitive conditions.
Some netizens asked the author, can this research be used to interpret dreams?
"It is possible to apply the same technique to brain activity during sleep, but the accuracy of such an application is currently unknown."
After reading this research: Legilimency is fully established.
References:
https://sites.google.com/view/stablediffusion-with-brain/
https://www.biorxiv.org/content/10.1101/2022.11.18.517004v2