Research progress and applications of in-memory computing chips

The development history of integrated storage and computing technology 

The integration of storage and computing includes near-memory computing and in-memory computing. The concept was first proposed in1969[9,10], scholars from various countries subsequently carried out a series of related research on circuits, algorithms, computing architecture, operating systems, system applications and other levels. For example, in 1997, the literature [11] demonstrated an intelligent Memory(Intelligent RAM) solution, which integrates processor andDRAM On a single chip, the computing power can reach the most advanced Crayvector processor (Cray T-90 )5 times. In 1999, the document [12] proposed an embedded computing function FlexRAM(FlexRAM) solution, simulation results show that the chip architecture can improve computing performance25 proposed[16], the document In 2019, including ETH Zurich, University of California, Santa Barbara, NVIDIA, Intel, Microsoft, Samsung, etc. [13–15] On the Internet, many universities and companies have launched their storage and computing integrated chips or system prototypes(Micro2017)Microprocessor Top Annual Conference2017No. The limitations of the Iman architecture are becoming more and more obvious. Coupled with the drive of big data applications and the continuous improvement of technology levels, storage and computing integration technology has attracted renewed attention and has become a research craze. For example, in., due to the gradual invalidation of Moore’s Law and von Since 2015 times. However, due to the lack of application requirements for big data processing in the early days, coupled with the expensive manufacturing costs and complex design of chips, the integrated storage and computing technology has only stayed in the research stage for many years. 40The integrated storage and calculation chip can realize the neural network convolution calculation of binary weights. In 2020, the document [17]showed a ModelReRAM All-in-one storage and calculation chip, which reduces computing latency and greatly improves energy efficiency. 2021, the literature [18]proposed three values DRAMThe integrated storage and computing architecture realizes the acceleration of neural network operations. 2022, the literature [19] proposed the existence of multi-core particles It is considered an integrated chip. Literature[20–24]published a series of articles based onSRAM/ReRAM Research results related to storage and computing integrated devices, chips and systems. So far, a series of related research work has emerged based on various storage media such asSRAM, DRAM, Flash, ReRAM, PCM, FeFET, MRAM a>The process is mature and the unit area is small; but it is also a volatile storageDRAM, and the unit area is large, the cost is high, and it is difficult to Achieve large-scale, large-power in-memory computing chips at lower costs. )Data is lost after power failure( good scalability; but it is a volatile memory has mature technology and SRAMAlthough research on storage and computing integrated chips based on various storage media is flourishing, each still faces some problems and challenges before large-scale industrialization. More specifically, . Research units include Samsung, TSMC, MIT, Princeton University, and Tsinghua University University, Peking University, Fudan University, University of Chinese Academy of Sciences and other top international universities and enterprises. 20Contains more than 20 papers related to storage and computing integrationISSCC is known as the top international conference for the Olympics in the chip field2021-2022 is shown. In particular, 2 Research on integrated storage and computing chips is booming, as shown in Figure , [25–38]







 

devices need to be refreshed regularly and have leakage problems, making it difficult to implement high-precision in-memory computing chips. In recent years, they have been widely used in near-memory computing. ReRAM is a non-volatile memory and can realize large-scale cross-point arrays. It is one of the potential media for realizing in-memory computing chips in the future; but at present The process is not yet mature, and the multi-bit precision of the memory unit is low(lower than8 bit). 1The rapid development of advanced packaging technology can achieve compatible integration with advanced logic processes. In summary, the performance comparison between storage and computing integrated chips based on different memory media is shown in Table2.5D/3D, but there are certain challenges in terms of miniaturization; fortunately, with WTM2101), Zhicun Technology's M1076’sMythicSuch as( is a non-volatile memory. Data is not lost when power is turned off. The process is mature and the cost is low. Mass production of chips has been achievedFlash is relatively small when implementing multi-bit There are certain challenges in in-memory computing chips. 250%)approximately( and the high and low resistance The ratio)about several thousand ohms( is a non-volatile memory with the advantages of high durability, high speed, low power consumption, etc. The technology is relatively mature and has good scalability, but The resistance of the deviceMRAM can realize non-volatile storage and can realize cross-point array; however, the current process is not yet mature. FeFET is a non-volatile memory and can implement large-scale cross-point arrays; however, it consumes large power , slower and less durable. PCM is poor in robustness. /, and consistency
 

Integrated storage and computing technology is also progressing very rapidly in the industry.
Many domestic and foreign companies are actively researching and developing, such as TSMC in Taiwan, Samsung of South Korea, Toshiba of Japan, and Mythic of the United States. , domestic Zhicun Technology, etc. But currently, those closest to industrialization are TSMC, Mythic and Zhicun Technology. Since 2019, TSMC has benefited from its strong process capabilities and has published a series of integrated storage and computing chip research results based on SRAM and ReRAM[40,41], with mass production and OEM capabilities. Mythic has launched the in-memory computing mass production chip M1076 based on NOR Flash
in 2021, which can support 80 MB neural network weights, and the computing power of a single chip reaches 25 TOPS. It is mainly targeted at edge-side intelligent scenarios. Zhicun Technology launched the in-memory computing SoC chip WTM2101 based on NOR Flash in 2021. Its computing power is two orders of magnitude higher than similar chips on the market, and its power consumption is less than 1 mW. It is mainly for end-users. Low power consumption and low cost application scenarios.  

 Research status of in-memory computing chips


Due to differences in computing paradigms and storage media, in-memory computing chips can be classified in different ways. According to different computing paradigms, it is mainly divided into two types: analog and digital. Analog in-memory computing means that the signals inside the storage unit or around the array are operated in the form of analog signals. Digital in-memory computing means that during the actual operation, the signals inside the storage unit or around the array are operated in the form of digital signals. operate. Among them, many research works include both analog and digital computing methods. At the same time, according to different storage media, in-memory computing chips can be divided into two types: based on traditional memory and based on new non-volatile memory. Traditional memories include SRAM, DRAM and Flash, etc.; new non-volatile Flexible memories includeReRAM, PCM, FeFET, MRAM, etc. Among them, those that are closer to industrialization are based onNOR Flash and based onSRAMIn-memory computing chip.

SRAMIn-memory computing

The SRAM-based in-memory computing chip is based on a typical 6T (6-Transistor) basic unit, as shown in Figure 3(a). Since SRAM is a binary memory, the binary multiplication and accumulation operation is equivalent to the same or accumulation operation, and can be used for binary neural network operations. The core idea is that the network weight is stored in the SRAM unit, and the excitation signal is input from the word line. Finally, Use peripheral circuits to implement exclusive or accumulation operations, and the results are output through counters or analog current/voltage. If you want to implement
multi-bit precision operations, you usually need to splice multiple units, which inevitably brings area overhead. A simple modification to the 6T basic cell is to split the word lines, as shown in Figure 3(b). In addition, in order to solve the problem of read and write interference, an 8T basic unit can be used, but the layout area is significantly increased, as shown in Figure 3(c). In-memory computing technology based on SRAM has attracted great attention from the industry due to its process maturity and good scalability. In recent years, many related papers have been reported at the ISSCC conference. For example, in 2021, there are two sub-forums on in-memory computing, with a total of 8 papers included, 5 of which are about SRAM in-memory computing chips. In the 2022 ISSCC, Peking University proposed an SRAM in-memory computing chip based on dynamic logic and without an analog-to-digital converter[42]. The main application difficulty of SRAM in-memory computing technology is to achieve high computing power and small area while ensuring computing accuracy.


DRAMIn-memory computing
The DRAM-based in-memory computing chip hierarchy can be divided into arrays, sub-arrays and units. A group of arrays consists of It consists of several subarrays and related peripheral circuits for read and write operations, and the subarray contains several rows of 1T1C (1-Transistor-1-Capacitor) units, sense amplifiers and local decoders. The basic principle is to utilize the charge sharing mechanism between DRAM cells[13,43]. Figure 4 shows a typical implementation scheme[43]. When multiple rows of cells are gated at the same time, different cells store data. The difference will produce charge exchange sharing, thereby realizing logical operations. There are two main difficulties with the DRAM in-memory computing solution: First, it is a volatile memory, and computing operations
will destroy data and need to be refreshed after each operation, causing power consumption problems; Second, it is difficult to ensure the accuracy of operations when implementing large array operations.


ReRAM/PCMIn-memory computing
The basic principle of ReRAM/PCM in-memory computing is to take advantage of the simulated multi-bit characteristics of the storage unit. Ohm's law and Kirchhoff's law of current/voltage perform matrix multiplication and addition operations. There are two main implementation schemes: 1T1R (1-transistor-1-resistance) structure and cross array structure, as shown in Figure 5(a) and Figure 5( b) shown. ReRAM can realize large-scale crosspoint arrays, making it a hot research direction in academia. Since the first experimental discovery of ReRAM in 2008, research on in-memory computing based on ReRAM has emerged one after another.

Especially in 2020, Tsinghua University developed an in-memory computing system based on multiple ReRAM arrays. The system’s recognition accuracy on handwritten digit sets reached 96.19%, which is equivalent to the recognition accuracy of software, proving that memory The feasibility of full hardware implementation of the internal computing architecture, the test chip is shown in Figure 5(c)[24]. ReRAM in-memory computing technology has very large application potential in the future. The main current difficulties are that the technology is not yet mature, multi-bit precision is difficult to achieve, and consistency/robustness is poor.

MRAMIn-memory computing
There are two main technical solutions for MRAM in-memory computing: (1) Digital memory based on read/write operations In-memory calculation; (2) Analog in-memory calculation based on Kirchhoff's current law and Ohm's law. Most of the early MRAM in-memory computing was based on digital solutions. For example, in 2015, Tohoku University in Japan proposed to implement a variety of Boolean logic and tape-out verification based on read operations, achieving a 48.3% energy efficiency improvement[44 ]; In 2019, Beijing University of Aeronautics and Astronautics proposed a digital MRAM in-memory computing solution based on a single write operation, which enables in-situ storage of calculation results while reducing latency and power consumption[45–47]. The difficulty of analog in-memory computing based on MRAM is that the resistance of the device (about several thousand ohms) and the ratio of high and low resistance (about 250%) are relatively small, making it difficult to achieve multi-bit accuracy. In recent years, thanks to multi-level innovative breakthroughs in computing paradigms, devices, and circuits, MRAM analog in-memory computing has developed rapidly. In 2021, Princeton University in the United States verified the first analog in-memory computing hard core based on STT-MRAM through circuit-level optimization and tapeout[ 4 8 ]; In 2022, South Korea's Samsung Company published a prototype of an MRAM analog in-memory computing chip based on a resistance accumulation scheme in the Nature journal, and achieved an energy efficiency ratio of up to 405 TOPS/W[49] , the layout, micrograph and structure of the array are shown in Figure 6.


NOR FlashIn-memory computing
The principle of in-memory computing technology based on NOR Flash is similar to that of ReRAM, as shown in Figure 7(a). At present, NOR Flash in-memory computing chip technology is relatively mature and has achieved mass production in 2021. Mythic in the United States and Zhicun Technology in China have both launched NOR Flash in-memory computing chip products. Among them, Mythic has launched the M1076 chip (as shown in Figure 7(b)), and Zhicun Technology has launched the WTM2101 mass production SoC chip (such as As shown in Figure 7(c)).


In-memory computing based on other media
In addition, the academic community has also published work related to in-memory computing based on NAND Flash and new nanodevices (such as FeFET, skyrmions, etc.) , its basic principle is similar to the above scheme, but it is currently only at the conceptual stage and will not be detailed here.

The application status of in-memory computing chips: takingWTM2101
as an example
With the continuous development of the Internet of Everything , smart devices mainly include three categories: cloud, edge and terminal. The requirements for cloud equipment are mainly high computing power, large throughput, and high reliability. Current in-memory computing progress is still difficult to meet the demand. Edge devices, such as security and autonomous driving, have relatively comprehensive requirements for computing power, latency, power consumption, security, etc.; terminal devices mainly focus on power consumption, cost and privacy. At present, the application of in-memory computing chips is still in its infancy.
This section takes the mass-produced SoC chip WTM2101 launched by Zhicun Technology as an example to discuss its application in edge and terminals, focusing on voice The introduction of the scene also introduces its core circuit and chip architecture, performance and application scenarios.


Core circuit and chip architecture
In the NOR Flash in-memory computing chip, the vector-matrix multiplication operation is physically implemented based on the transconductance of current/voltage and Kirchhoff's law. As shown in Figure 7(a). Therefore, the core is to design the NOR Flash cell array to meet large-scale energy-efficient vector-matrix multiplication operations. At the same time, based on the core circuit, the chip architecture is designed according to the algorithm characteristics to fully utilize the characteristics of neural network data flow to achieve parallelization and pipeline of the chip. In traditional NOR Flash arrays, programming a specific device will inevitably change the status of other devices in the same row, which is called row interference. As an in-memory computing application, NOR Flash programming requires separate operations on each device. Each device stores more than 8 bits (256 quantized states) of information, and slight interference will cause state changes. Therefore, an anti-programming interference array structure is needed to eliminate programming interference. In addition, NOR Flash stores information based on the number of electrons in the floating gate
. As time increases, electrons will leak, causing the threshold voltage to drift. NOR Flash devices used for storage applications usually only store 1 to 2 bits of information (corresponding to 2 to 4 different states). The margin between states is relatively large, and information can be stored for more than 10 years without special design. However, in in-memory computing applications, NOR Flash devices need to store more than 8 bits (256 different states) of information, the margin between states is very small, and the entire array works simultaneously. Therefore, the impact of threshold voltage drift is very large. WTM2101 suppresses the impact of threshold voltage drift on calculation accuracy through special circuit design. In addition, in order to achieve low-power computing and low-power control at the same time, WTM2101 combines the RISC-V instruction set and the NOR Flash in-memory computing array. Its array structure and chip architecture are shown in Figure 8, including 1.8 MB NOR Flash memory. Compute array, a RISC-V core, a digital computing accelerator group, 320 kB
RAM and a variety of peripheral interfaces.   

Performance and application scenarios
WTM2101 is tape-out based on 40 nm process. A single NOR Flash device can store 8-bit weights, so it can perform matrix multiplication and accumulation operations with 8-bit precision. Figure 9 shows the relationship between the input signal and the output current. Both the unit and the chip show a good linear relationship. WTM2101 has four major advantages: (1) Based on the in-memory computing architecture, it can efficiently realize neural network voice activation detection and hundreds of voice command word recognition. (2) Implement neural network environmental noise reduction algorithms, health monitoring and analysis algorithms with ultra-low power consumption. (3) In typical application scenarios, the operating power consumption is at the microwatt level. (4) Adopt extremely small package size. Based on the above advantages and characteristics, WTM2101 can be used in smart wearable devices, smart homes, security monitoring,toy robots, etc.; it is suitable for a variety of applications, such as speech recognition , voice noise reduction/enhancement, lightweight visual recognition, health monitoring and voiceprint recognition, etc. Figure 10 shows the headset product equipped with WTM2101 and its automated deployment process. Figure 11 shows the waveform and spectrum comparison of the before and after effects of headphone noise reduction based on WTM2101. As shown in Table 2, the cumulative cosine similarity of each layer of the neural network deployed in WTM2101 (referring to the cosine similarity calculated in memory relative to 8-bit quantization calculation), it can be seen that after 8 layers of neural network calculation, the cosine similarity < /span>. above 0.99. Table 3 shows the comparison between WTM2101 and similar products on the market in terms of voice activation detection, voice wake-up, command word recognition, environmental denoising and voiceprint recognitionThe degree remains

Application prospects and challenges of in-memory computing chips 

In-memory computing chip technology, due to its advantages such as high computing power, low power consumption, and low cost, can provide energy-efficient hardware for intelligent application scenarios with massive data characteristics such as the Internet of Things, big data, and artificial intelligence in the future. solution. However, there are still many challenges to achieve large-scale industrialization: (1) It is difficult to improve the accuracy of analog calculations. The accuracy of analog in-memory calculations is affected by the signal-to-noise ratio, and it is difficult to achieve more than 8. Digital in-memory computing is not affected by the signal-to-noise ratio, but its energy efficiency, area and cost need to be weighed comprehensively. In recent years, a good compromise between accuracy, cost and power consumption can be achieved through a mixed digital-analog design approach, which is an important direction in the development of in-memory computing. (2) The tool chain link needs to be more improved to establish a good ecosystem: The industrialization of in-memory computing chips is in its infancy, and currently faces the problem of insufficient support from related tool chains, making it difficult for algorithm/application manufacturers to transplant. With the rapid development of in-memory computing technology, and enterprises continuing to increase investment in this technology field, the corresponding compilation, optimization and other tool chains can rapidly progress, and it is expected to establish a preliminary application ecosystem. (3) Cross-layer collaborative design needs to be further strengthened: In-memory computing chips involve multi-level cross-layer collaboration such as device-chip-process-algorithm-application. Each layer is interlocked and inseparable, and cross-layer collaboration is required to achieve performance. (Accuracy, power consumption, delay, reliability, etc.) and cost optimization.
 

 

Guess you like

Origin blog.csdn.net/m0_58966968/article/details/135023079