All-in-one AI chip

Storage and Computing Integrated AI Chip
Recently, Samsung Electronics published the world's first in-memory computing research based on MRAM (magnetic random access memory) in the top academic journal Nature, A crossbar array of magnetoresistive memory devices for in-memory computing, January 2022.
In-memory computing greatly reduces the power consumption of AI computing because it does not require data to be moved between the memory and the processor, and is regarded as a cutting-edge research in edge AI computing.
The new computing architecture fills the gap of MRAM
and the integrated storage and computing technology route is in the pattern of a variety of storage media blooming. According to different memory media, the current mainstream research and development of memory-computing integrated chips focus on
• Volatile memory, such as SRAM, DRAM
• Non-volatile memory, such as RRAM, PCM, MRAM and flash memory, etc.
The more mature ones are SRAM and MRAM The representative general-purpose near-memory computing architecture, usually using a homogeneous many-core architecture, each storage computing core (MPU) contains:
• Processing Engine (PE)
• Cache (Cache)
• Control (CTRL)
• Input and output (Inout/Output, I/O)
non-volatile RRAM (resistive random access memory) and PRAM (phase change random access memory) are the two most commonly used types of memory for in-memory computing. Compared with other memories:
• MRAM magnetoresistive memory has obvious advantages in operating speed, lifespan, and mass production
• Power consumption is also much lower than traditional DRAM
• It has the characteristics of non-volatile, that is, data will not be lost when power is turned off
. MRAM magnetoresistive memory is difficult to use for in-memory computing because the low-power benefits cannot be exploited in standard in-memory computing architectures.
Researchers at Samsung Electronics have built a new MRAM-based in-memory computing architecture that fills this gap. Through structural innovation, in-memory computing (In-Memory Computing) based on MRAM (Magnetoresistive Random Access Memory) is realized, which further expands the frontier field of Samsung's next-generation low-power artificial intelligence chip technology.
A Samsung research team has designed a new in-memory computing architecture called "resistance sum" to replace the standard "current-sum" architecture, and has successfully developed an in-memory computing architecture that demonstrates MRAM array chip, named "crossbar array of magnetoresistive memory devices for in-memory computing".
This array successfully solves the small resistance problem of a single MRAM device, reduces power consumption, and realizes MRAM-based in-memory computing. When performing AI computing, MRAM in-memory computing can achieve 98% success rate of handwriting recognition and 93% accuracy rate of face recognition.
The research in the paper does not compete with other memory-based in-memory computing architectures. So far, no single memory type has dominated electronics, as each type of memory has its own advantages and disadvantages. In-memory computing based on different memories may also develop into different architectures. Samsung Electronics contributes to the development of in-memory computing by filling in the gap of in-memory computing architecture based on MRAM memory.

As shown in the figure below, the cache here can be SRAM, MRAM or similar high-speed random access memory. Each MPU is connected through a Network-on-Chip (NoC). Each MPU accesses its own cache, enabling high-performance parallel computing.
insert image description here

Cache-based general near-memory computing architecture
There are two main solutions for MRAM-based memory-computing integration:
• The first solution is to use auxiliary peripheral circuits, similar to the above SRAM memory-computing integration, as shown in Figure (a)
 A typical
Reconfigurable storage-computing integrated implementation
scheme , which can be switched between storage applications and storage-computing integrated applications Directly use the storage unit to implement Boolean logic calculation, as shown in Figure (b)
 This scheme directly uses the input and output operations of the storage unit to perform logical
operations
insert image description here

The basic principle of memory-computing integration based on RRAM/PCM/MRAM
(a) using the peripheral circuit scheme
(b) using the memory cell scheme One of the important challenges that
may be used in the future for biological neuron network
MRAM arrays to run in-memory computing is to build an AI SoC (on-chip). systems), integrating many arrays and data converters, digital electronics. Memory arrays can not only be used to compute neural network algorithms, but also serve as potential carriers of biological neuron networks.
In September 2021, Samsung Electronics and Harvard jointly published a paper titled "Neuromorphic electronics based on copying and pasting the brain" in Nature Electronics, a sub-journal of Nature, proposing a The possibility of "copying and pasting" the brain's neuronal wiring map onto a high-density 3D storage network.
insert image description here

Samsung's previous "copy and paste" brain research (Credit: Nature)
Seungchul Jung, lead author of the MRAM array study, said that in-memory computing is similar to computing in the human brain because human computing also occurs in memory or synaptic networks . While the current computing purpose of MRAM arrays is not to mimic the brain, this solid-state storage network may one day be used as a platform for simulating brain synapses.
Why put forward the integration of storage and computing?
As early as 1992, Xu Juyan, an academician of the Chinese Academy of Engineering, predicted that from 2014 to 2017, human beings will enter an inflection point on the life curve of silicon technology, and will soon enter the "post-Moore era". The existing Von Neumann computing system adopts an architecture that separates storage and computing, and there are bottlenecks in the "storage wall" and "power wall", which seriously restrict the improvement of system computing power and energy efficiency. The development of artificial intelligence has been constrained by insufficient computing power and low energy efficiency.
insert image description here

Moore's Law and the Evolution of AI Algorithm Computing Power
In the Von Neumann architecture, the memory and the processor are two completely separate units. The processor reads data from the memory according to the instructions, completes the operation, and stores it back into the memory. The narrow data exchange path between the two and the resulting high energy consumption have created a "storage wall" between storage and computing.
insert image description here

Under the data-based AI computing, the challenges of the "storage wall" and "power wall" of the von Neumann architecture are prominent. Today, more than half a century later, is there a way to climb over the "two walls"?
As the computing power increases, the number of processor cores increases, and the available bandwidth per core decreases, limiting the overall speed. Handling data has become a considerable bottleneck.
insert image description here

Without storage optimization, the computing power provided by the chip will be greatly reduced.
Current computing processors such as CPU, GPU or AI-specific chips are all designed with von Neumann architecture. 80% of the power consumption occurs in data transmission, and 99% of the time is consumed in the process of memory writing, which is actually used for The energy consumption and time of calculation are actually very low.
With the rapid development of artificial intelligence, artificial intelligence algorithms have more stringent requirements for information interaction between logic units and storage units than traditional tasks. AI computing is mainly based on data, and a large amount of data handling leads to high power consumption. By 2025, global data centers will use 20% of the world's electricity.
AlphaGo beats humans at chess, but humans use only 20 watts of brain power, AlphaGo's is 20,000 watts. If more mental work were to be replaced by machines, the heat from the chips would make the planet scalding hot.
Only large computing power based on low power consumption is sustainable.
insert image description here

The most fundamental solution to the storage wall is to integrate storage and computing, and use storage units for computing.
The integration of storage and computing is to transfer the operation in the computer from the central processing unit to the memory, and directly perform the operation in the storage unit, so as to ease the data transfer, which can greatly reduce the data exchange time and the data access energy consumption during the calculation process.
The integration of storage and computing has become an effective way to achieve high bandwidth, low power consumption, and computing requirements at this stage.
insert image description here

Comparison of Von Neumann Architecture and In-Memory Computing The
integration of storage and computing has ushered in an explosive moment
Limited by the complexity of chip design and manufacturing costs, as well as the lack of killer big data applications to drive, the integration of storage and computing has been tepid.
PIM: Processing in-memory is regarded as the core of artificial intelligence innovation. The organic combination of storage and computing, directly using the storage unit for computing, greatly eliminates the overhead caused by data movement, and solves the "storage wall" and "power wall" problems of traditional chips in running artificial intelligence algorithms. It can improve the computing efficiency of artificial intelligence tenfold or even a hundredfold and reduce the cost.
In particular, a large number of domestic storage and computing integrated technology companies have surfaced along with financing information, and foreign Samsung and Myhtic are also dedicated researchers in this field
. The C round of financing was US$70 million, and a total of US$165 million has been raised so far
. On June 10, Zhicun Technology announced the completion of the A3 round of financing of 100 million yuan, product line expansion and new product mass production, plus the previous two rounds Financing. Up to now, Zhicun Technology has completed a series of financing of nearly 300 million yuan.
On June 25, Jiutian Ruixin received a round of financing of 100 million yuan, which is used for new product development and personnel expansion.
July 2 On August 24 , Hangzhou Zhixin Technology completed an angel round of financing of nearly 100 million yuan, which was used to continue to build the team and start the next stage of ACIM technology research and development and market expansion
. On August 24, Houmo Smart announced the completion of the 300 million yuan Pre-A round The financing will be used to accelerate the technology research and development of chip products, team development, early market layout and commercial
implementation. On August 24th, Apple Chip Technology completed nearly 10 million US dollars
of Pre-A round financing. The eagerness to jump in fully proves that capital favors the integration of storage and computing. Why is the storage and computing integrated chip market so optimistic?
• First, the computing power and the amount of computing data are increasing exponentially every year, but Moore's Law is close to its limit, and each generation of chips has only a 10-20% performance improvement
. Second, the computing power of the von Neumann architecture has been replaced by memory Only by solving the memory wall problem can the computing power be further improved.
Third, it is highly compatible with the basic operators in the deep learning network computing model, so that the chips based on the in-memory computing architecture are more efficient than the existing AI acceleration chips in the market. There is an order of magnitude improvement in computing efficiency (TOPS/Watt).
Fourth, general-purpose computing chips do not have cost-effective advantages in serving specific AI algorithms. Among various solutions, in-memory computing is the most direct and efficient.
insert image description here

Summary
In the era of intelligence, from wearables to autonomous driving, computing efficiency in scenarios under power consumption constraints is an eternal theme. In-memory computing is one of the most powerful weapons to liberate computing power and improve energy efficiency. The storage-computing integration technology subverts the existence of the traditional von Neumann architecture and is a future trend, but the popularization of applications from the consumer-level to the enterprise-level market may take ten years or even longer to solidify the foundation and upgrade.
The development of in-memory computing technology is an important technical route to pursue energy-efficient computing, and how to effectively control the in-memory computing interface is an important challenge. Whoever has an in-memory computing hardware architecture that takes into account both computing density and storage density has the golden key to unlocking high-efficiency computing.
"Integration of storage and computing" breaks the 70-year-old Von Neumann architecture and will become the mainstream computing architecture in the AI ​​era. At present, the integration of storage and computing is in its infancy at home and abroad, and the integration of storage and computing is in a critical period of migration from academia to industry, so this may be another important direction for the development of domestic chips.

Reference link
https://mp.weixin.qq.com/s/aH-_YiXppNeB0xzGxZ-26Q

Guess you like

Origin blog.csdn.net/wujianing_110117/article/details/123541106