On-chip SRAM storage overview and production practice (Part 1)

The chip is a complex system with a very wide volume distribution, ranging from digital-analog hybrid chips with hundreds of gates (such as: PMU, sensor, etc.) all the way to tens of billions of complex high-end digital chips (such as: Apple Axx, Kirin : 9xx, MediaTek: Dimensity series or various huge NP network chips, etc.) The
same chip can also be distinguished from different dimensions, such as: logical function classification (core, peripheral, interface, etc.), gate-level function classification (registers, combinatorial logic, memory, phy, etc.).
Here, let’s start with the gate-level functions of the chip, and let’s take a look at the details and generation practice of on-chip storage. This series is divided into three parts: upper, middle, and lower. world.
insert image description here

On-chip memory classification

In order to cooperate with the functions and applications of the chip, there will be a lot of usage scenarios for data storage, which can usually be described in the following table

use storage medium describe
On-chip main data storage SRAM, DRAM, FF The chip needs to save data when it is working, and the corresponding integration and speed of different types are different.
Storage of fixed content for startup mounts ROM Fixed content, read-only storage device, cannot be changed, once the chip is generated, it cannot be changed, unless ECO
On-chip Programmable Fixed Information Storage Device e-Flash User-customized programs can be saved, and at the same time, write-protection design can be carried out, and the application scalability is better
Memory device for chip status and flag bits e-fuse, OTP The results of ATE tests are usually stored, such as: bad points and inspections of memory, hard limits of chip operation, such as: limits of on-chip available storage, etc. Usually only chip manufacturers use special means to program

For large chips, all of the above may be used, but the most common and the largest proportion is the first category. Here, use the following table to disassemble the characteristics of the three storage methods.

type advantage insufficient
SRAM Relatively high integration, fast speed, various types, support many manufacturers, including FAB original factory The specific PPA may vary depending on different vendors
DRAM The physical structure is simple, and the integration level is higher than that of SRAM. It is mostly used in large-scale storage scenarios, such as: DDR DRAM It needs to be refreshed regularly, and the control circuit is relatively complicated. It is usually formed into a separate piece, and it is difficult to coexist with other chips. It often needs to be packaged together with other types of chips through advanced packaging
FF (register) Like SRAM, it achieves the purpose of on-chip storage. When the amount of storage is small, it has an area advantage over SRAM. Compared with SRAM, the integration level is much lower, the layout is more difficult to control, and the skew of the clock is unstable

It can be seen from the comparison in the above table that for large storage requirements and from the perspective of simplicity, SRAM is the best choice for such scenarios

SRAM storage structure

The core storage device of SRAM is usually called a bit cell. Specifically as shown in the figure below

insert image description here

It can be seen that the external logic writes and reads the bit cell through the control signals: BL (BitLine) and WL (WordLine), which is the so-called six-tube structure, consisting of four NMOS and two PMOS

write operation

  1. First load the data that needs to be written to BL. If you are going to write logic '1', then set it to logic '1' on BL first, and set ~BL to logic '0'
  2. Set logic '1' on WL, so that by selecting M5/M6, the corresponding logic is written to Q and ~Q, thus completing the writing of logic '1'. The writing method of logic '0' is
    similar

read operation

  1. Precharge to BL/~BL terminal to high level
  2. Then set WL high to open M5/M6,
  3. If Q="1", transistor M1 is turned on and ~BL is pulled low
  4. For the other side, because ~Q="0", the transistors M4 and M6 are turned on, and the BL is pulled to a high level through VDD. This completes the reading method of
    reading logic 1 to BL for logic '0'
    similar

For the bit cell, under the same process, the size of the bit cell with different functions may be different. Here we take the memory bit cell of TSMC 7nm as an example

insert image description here

It can be seen that for HD, RF, and DP, different bit cell sizes will appear, and each bit-cell here is a complete device size that stores 1 bit . Based on this, the advantage of comparing the area with the DFF that also stores 1 bit is obvious, as shown in the figure below:

7nm SRAM bit cell area (um^2) 7nm area of ​​a single FF register (um^2) Area advantage
0.0342 0.547 X ~16 times

Structural disassembly of on-chip SRAM

For the SRAM on the chip, the bit cells are spliced ​​into a matrix. The horizontal and vertical distribution of the matrix is ​​controlled by BL and WL. This matrix composed of bit cells is usually called a memory array .

The usual SRAM is composed of the following two parts

Classification describe Impact of PPAs
Main storage part (memory array) The storage unit is composed of bit cells, and also includes the charging and discharging circuit of the unit The main contribution of the memory PPA. Area: accounting for more than 90%; performance: charging and discharging circuit design affects access performance; power consumption: accounting for more than 80%
Peripheral control part (peripheral) The construction of peripheral control circuits using standard std-cell, including decoding logic, column-mux, DFT related logic, etc. There is a slight impact on the timing of the interface. Usually, the timing can be fine-tuned by replacing the std-cell VT type here; the configuration of different memory configs will obviously affect the structure of the std-cell here and the interface pins of the memory.

The general schematic diagram of SRAM is as follows

insert image description here

Here you can clearly see the figure of the control logic. From the perspective of external access, they form a surrounding mode for the memory array.

A more realistic SRAM structure is shown in the figure below,

insert image description here

The simple access steps of an SRAM are as follows:

  1. First determine whether to read or write, and perform the corresponding AND operation on BL. For details, see the definition of read and write operations of bitcell above.
  2. Configure the address line and select the corresponding WL
  3. If it is a read operation: the Data below will appear a whole row of WL bitcell data; if it is a write operation: the Data below will be written into the bitcell of the selected WL

The above visits present the following key points:

  • Read and write operations are usually performed on a WL

  • If the bitcell number of WL remains unchanged, the depth of the address directly determines the physical height of the SRAM.

  • The capacity of SRAM is usually determined by NW*NB

    • NW: Number of Words, SRAM depth
    • NB: Number of Bit, SRAM width
    • Bitcell Count: NW * NB SRAM capacity
  • Address decoding and data paths are usually directly formed by std-cell, which is a factor affecting the timing of the interface

  • The SRAM that supports bitwise can control the write operation of a single bit by controlling the BL corresponding to a certain bitcell when writing.

Combined with the reading method of bitcell, friends can think about it, why does the reading of SRAM not have the function of selecting bits?

For ordinary users, a SRAM with a large depth is generally common, for example: 256K * 8 and the like. But for the above SRAM' structure, it can be perceived that the simple accumulation of bit cell height will have the following problems

  • SRAM is too slender, not friendly to the layout, and the contact points of PG are not balanced
  • Since the address and data output of SRAM are usually centered and distributed planning, an excessively high SRAM will definitely cause a problem of driving ability at a certain height

Therefore, a ColumnMux (CM) solution is proposed here: when NW >> NB, without changing the size of SRAM, you can use the method of increasing CM to solve this kind of problem

insert image description here

It can be seen that the vertically long SRAM is folded in half and arranged side by side, which effectively reduces the height of the SRAM.
By using CM=2, the user also increases the combination logic of the CM decoder part accordingly. A simple conversion formula is as follows:

Depth * Width = (Depth/2) * width + (Depth/2) * width

The part (high bit) of the address line will directly participate in the decoding of the CM, so that the height of the SRAM can be effectively reduced by folding to the power of 2. Also because the bitcells are more concentrated, the interface timing will also be improved accordingly, and the corresponding area increase (CM decoder) will have negligible impact on large-scale SRAM.

Vocabulary in this chapter

vocabulary explain
SRAM Static Random Access Memory
bit cell Usually use the memory device of SRAM composed of 6T structure
Column Mux Stacking arrays can effectively reduce the height of SRAM
bit line SRAM bit line control
word line SRAM word line control

[Knock on the blackboard to draw key points]

insert image description here
Learn SRAM from the basic theory, understand the principle of reading and writing of SRAM, and lay the foundation for later use

References

Neil H.E. Weste • David Money Harris CMOS VLSI Design - A Circuits and Systems Perspective
TSMC TSMC N7 SRAM Compiler Databook

Guess you like

Origin blog.csdn.net/i_chip_backend/article/details/121130370