CMOS image sensor senses and processes optical images simultaneously

CMOS image sensor senses and processes optical images simultaneously

Overview

In recent years, machine vision technology has made a huge leap, and has now become an integral part of various intelligent systems, including autonomous vehicles and robots. Typically, visual information is captured by frame-based cameras, converted to digital format, and then processed using machine learning algorithms (such as artificial neural networks ( ANN) 1). However, large (mostly redundant) data passes through the entire signal chain, resulting in low frame rates and high power consumption. In order to improve the efficiency of subsequent signal processing in artificial neural networks, various visual data preprocessing techniques have been developed. Here, we prove that the image sensor itself can constitute a neural network, which can simultaneously sense and process optical images without delay. Our device is based on a reconfigurable two-dimensional ( 2D) semiconductor photodiode array, and the synaptic weight of the network is stored in a continuously adjustable optical response matrix. We demonstrated supervised and unsupervised learning, and trained sensors to classify and encode images projected onto the chip optically, with a throughput of 20 million boxes per second .

The camera CMOS constitutes a neural network, which can simultaneously sense and process optical images
as the focus of competition among various manufacturers. The camera on mobile phones today has achieved 100 million pixels, and the camera photosensitive device is also a typical semiconductor chip, which is essentially a diode. What is the effect of a class of precise structure used for neural network operation? The latest study in the journal Nature tells us that the speed is thousands of times faster than traditional processing methods.
For computer vision, the lens is its eyes, which can be passed to the processing unit after obtaining rich visual information, and rely on it to achieve various visual capabilities. This is the most conventional paradigm of CV, and it is most in line with our intuition, but you will find that there are two major problems.
First of all, passing information from the lens to the processing unit is not a simple step, especially when cloud computing is used. This is because the visual information is too rich. Secondly, the work of the processing unit is not simple, and the visually intensive computing model is often prohibitive.
In this latest Nature study, the researchers showed that the image sensor itself can also "engrave" the neural network, which can serve both the functions of photoreception and image processing at the same time, without delay. More importantly, this machine

Vision chips are thousands of times faster than traditional convolutional neural networks. As long as a chip, the camera becomes an intelligent terminal in seconds.

 

 Like the human brain, the new chip can perceive and classify simple images at nanosecond speeds.

In addition to being fast, because the basic component of the chip is a photodiode, it can only "generate" and complete calculations by relying on light only. The running speed is limited by the speed of the electronics in the circuit.
At present, the research submitted by the Vienna University of Science and Technology has been published in the journal Nature on March 4 .

Engraving neural networks for chips

In order to "imprint" the neural network onto the image sensor, the researchers constructed a photodiode network on the chip. These photodiodes are very tiny and very sensitive to light. We can adjust the sensitivity of each diode by changing the voltage to increase or decrease the response of each diode to light.
In fact, these photoelectric sensor networks are equivalent to neural networks, and they can perform simple computational tasks. Changing the light response intensity of the photodiode will change the connection strength in the network, which is similar to the weight in the neural network. Therefore, the chip skillfully combines optical sensing with neuromorphic calculations.

 

 a is the diode array of the neural network, sub-pixels with the same color will be connected in parallel; b is the circuit diagram of a single pixel in the photodiode array; c and d are the familiar neural network model, they can be "embedded" into the chip in.

The sensor is composed of a group of pixels, and each pixel represents a neuron. At the same time, each pixel is composed of several sub-pixels in turn, and each sub-pixel represents a synapse. Each photodiode is based on a layer of tungsten diselenide (tungsten diselenide), which is a two-dimensional semiconductor responsive and adjustable. This adjustable light responsivity is similar to the weights in neural networks.
The researchers say that the photodiodes are arranged in a square matrix of 9 pixels, each with 3 diodes. When the light of an image is mapped to the chip, various diode currents are generated and combined to complete the simulation calculation form provided by the hardware array. In other words, as soon as you feel the light, the on-chip "neural network" starts to calculate.
Training the neural network The
entire array can be trained to perform visual tasks. Because the current generated by the array is not consistent with the predicted current, the researchers can analyze and adjust the weights on the computer to update the neural network on the chip. Although the training process requires time and computing resources, once the training is completed, the chip will be able to quickly process visual tasks.
Scientists have created a neural network based on the connection between these photodiodes, and can train these neural networks to classify images into letters " n", " v", or " z".
"Our image sensors don't consume any power when they work." Mennel said, "The sensed photons provide electricity."

 

 In the experiment, the researchers used a laser to project " v" and " n" onto the neural network image sensor. Traditional computer vision technology can usually process 100 frames per second , and some faster systems may be able to process 1000 frames per second . Mennel said that by comparison, "our system can process almost 20 million frames per second .

 

 a is the experimental configuration of the training classifier and self-encoder, b is the experimental setup for measuring time-resolved, and c is the close-up photo of the optical experiment.

Mennel mentioned that the speed of the system is limited only by the speed of the electronics in the chip. In principle, this strategy can be completed in picoseconds, which is 3 to 4 orders of magnitude faster than existing visual methods . In addition to the letter recognition and classification models, the researchers also tested the self-encoder model in the experiment. In the presence of signal noise in this model, the sensor computing array can also learn the key features of the image and decode it to build a generated map close to the original image. As long as the training is completed, the inference speed of this unsupervised generative model is also very fast.

So, what is the use of the chip?
What are the uses of such sensors? "At this stage, this technology is mainly used for specific scientific research, such as fluid dynamics, combustion processes, or mechanical fault handling." Mennel said, "For more complex tasks, such as machine vision in autonomous driving, we It may require higher complexity. "
This kind of photosensitive + computing chip still has a long way to go before it is put into practical use . For real visual information, it also contains three-dimensional information, dynamic images and video timeline. However, the current image sensing technology can only compress three dimensions to two dimensions, and the chip loses a lot of information.
At the same time, the author said that the chip needs to be redesigned in dim conditions to increase the range of detectable light intensity. This "redesign" requires high pressure and consumes a lot of energy. The last is the ability to make semiconductors. Such ultra-thin semiconductors are difficult to produce in large areas and difficult to process.
However, despite the many obstacles, it is very interesting to combine the neural network with the light-sensing ability to generate current while receiving light and complete visual tasks while generating current.

Guess you like

Origin www.cnblogs.com/wujianming-110117/p/12723859.html