The chip is a complex system with a very wide volume distribution, ranging from digital-analog hybrid chips with hundreds of gates (such as: PMU, sensor, etc.) all the way to tens of billions of complex high-end digital chips (such as: Apple Axx, Kirin : 9xx, MediaTek: Dimensity series or various huge NP network chips, etc.) The
same chip can also be distinguished from different dimensions, such as: logical function classification (core, peripheral, interface, etc.), gate-level function classification (registers, combinatorial logic, memory, phy, etc.).
Here, let’s start with the gate-level functions of the chip, and let’s take a look at the details and generation practice of on-chip storage. This series is divided into three parts: upper, middle, and lower. world.
On-chip memory classification
In order to cooperate with the functions and applications of the chip, there will be a lot of usage scenarios for data storage, which can usually be described in the following table
use | storage medium | describe |
---|---|---|
On-chip main data storage | SRAM, DRAM, FF | The chip needs to save data when it is working, and the corresponding integration and speed of different types are different. |
Storage of fixed content for startup mounts | ROM | Fixed content, read-only storage device, cannot be changed, once the chip is generated, it cannot be changed, unless ECO |
On-chip Programmable Fixed Information Storage Device | e-Flash | User-customized programs can be saved, and at the same time, write-protection design can be carried out, and the application scalability is better |
Memory device for chip status and flag bits | e-fuse, OTP | The results of ATE tests are usually stored, such as: bad points and inspections of memory, hard limits of chip operation, such as: limits of on-chip available storage, etc. Usually only chip manufacturers use special means to program |
For large chips, all of the above may be used, but the most common and the largest proportion is the first category. Here, use the following table to disassemble the characteristics of the three storage methods.
type | advantage | insufficient |
---|---|---|
SRAM | Relatively high integration, fast speed, various types, support many manufacturers, including FAB original factory | The specific PPA may vary depending on different vendors |
DRAM | The physical structure is simple, and the integration level is higher than that of SRAM. It is mostly used in large-scale storage scenarios, such as: DDR DRAM | It needs to be refreshed regularly, and the control circuit is relatively complicated. It is usually formed into a separate piece, and it is difficult to coexist with other chips. It often needs to be packaged together with other types of chips through advanced packaging |
FF (register) | Like SRAM, it achieves the purpose of on-chip storage. When the amount of storage is small, it has an area advantage over SRAM. | Compared with SRAM, the integration level is much lower, the layout is more difficult to control, and the skew of the clock is unstable |
It can be seen from the comparison in the above table that for large storage requirements and from the perspective of simplicity, SRAM is the best choice for such scenarios
SRAM storage structure
The core storage device of SRAM is usually called a bit cell. Specifically as shown in the figure below
It can be seen that the external logic writes and reads the bit cell through the control signals: BL (BitLine) and WL (WordLine), which is the so-called six-tube structure, consisting of four NMOS and two PMOS
write operation
- First load the data that needs to be written to BL. If you are going to write logic '1', then set it to logic '1' on BL first, and set ~BL to logic '0'
- Set logic '1' on WL, so that by selecting M5/M6, the corresponding logic is written to Q and ~Q, thus completing the writing of logic '1'. The writing method of logic '0' is
similar
read operation
- Precharge to BL/~BL terminal to high level
- Then set WL high to open M5/M6,
- If Q="1", transistor M1 is turned on and ~BL is pulled low
- For the other side, because ~Q="0", the transistors M4 and M6 are turned on, and the BL is pulled to a high level through VDD. This completes the reading method of
reading logic 1 to BL for logic '0'
similar
For the bit cell, under the same process, the size of the bit cell with different functions may be different. Here we take the memory bit cell of TSMC 7nm as an example
It can be seen that for HD, RF, and DP, different bit cell sizes will appear, and each bit-cell here is a complete device size that stores 1 bit . Based on this, the advantage of comparing the area with the DFF that also stores 1 bit is obvious, as shown in the figure below:
7nm SRAM bit cell area (um^2) | 7nm area of a single FF register (um^2) | Area advantage |
---|---|---|
0.0342 | 0.547 | X ~16 times |
Structural disassembly of on-chip SRAM
For the SRAM on the chip, the bit cells are spliced into a matrix. The horizontal and vertical distribution of the matrix is controlled by BL and WL. This matrix composed of bit cells is usually called a memory array .
The usual SRAM is composed of the following two parts
Classification | describe | Impact of PPAs |
---|---|---|
Main storage part (memory array) | The storage unit is composed of bit cells, and also includes the charging and discharging circuit of the unit | The main contribution of the memory PPA. Area: accounting for more than 90%; performance: charging and discharging circuit design affects access performance; power consumption: accounting for more than 80% |
Peripheral control part (peripheral) | The construction of peripheral control circuits using standard std-cell, including decoding logic, column-mux, DFT related logic, etc. | There is a slight impact on the timing of the interface. Usually, the timing can be fine-tuned by replacing the std-cell VT type here; the configuration of different memory configs will obviously affect the structure of the std-cell here and the interface pins of the memory. |
The general schematic diagram of SRAM is as follows
Here you can clearly see the figure of the control logic. From the perspective of external access, they form a surrounding mode for the memory array.
A more realistic SRAM structure is shown in the figure below,
The simple access steps of an SRAM are as follows:
- First determine whether to read or write, and perform the corresponding AND operation on BL. For details, see the definition of read and write operations of bitcell above.
- Configure the address line and select the corresponding WL
- If it is a read operation: the Data below will appear a whole row of WL bitcell data; if it is a write operation: the Data below will be written into the bitcell of the selected WL
The above visits present the following key points:
-
Read and write operations are usually performed on a WL
-
If the bitcell number of WL remains unchanged, the depth of the address directly determines the physical height of the SRAM.
-
The capacity of SRAM is usually determined by NW*NB
- NW: Number of Words, SRAM depth
- NB: Number of Bit, SRAM width
- Bitcell Count: NW * NB SRAM capacity
-
Address decoding and data paths are usually directly formed by std-cell, which is a factor affecting the timing of the interface
-
The SRAM that supports bitwise can control the write operation of a single bit by controlling the BL corresponding to a certain bitcell when writing.
Combined with the reading method of bitcell, friends can think about it, why does the reading of SRAM not have the function of selecting bits?
For ordinary users, a SRAM with a large depth is generally common, for example: 256K * 8 and the like. But for the above SRAM' structure, it can be perceived that the simple accumulation of bit cell height will have the following problems
- SRAM is too slender, not friendly to the layout, and the contact points of PG are not balanced
- Since the address and data output of SRAM are usually centered and distributed planning, an excessively high SRAM will definitely cause a problem of driving ability at a certain height
Therefore, a ColumnMux (CM) solution is proposed here: when NW >> NB, without changing the size of SRAM, you can use the method of increasing CM to solve this kind of problem
It can be seen that the vertically long SRAM is folded in half and arranged side by side, which effectively reduces the height of the SRAM.
By using CM=2, the user also increases the combination logic of the CM decoder part accordingly. A simple conversion formula is as follows:
Depth * Width = (Depth/2) * width + (Depth/2) * width
The part (high bit) of the address line will directly participate in the decoding of the CM, so that the height of the SRAM can be effectively reduced by folding to the power of 2. Also because the bitcells are more concentrated, the interface timing will also be improved accordingly, and the corresponding area increase (CM decoder) will have negligible impact on large-scale SRAM.
Vocabulary in this chapter
vocabulary | explain |
---|---|
SRAM | Static Random Access Memory |
bit cell | Usually use the memory device of SRAM composed of 6T structure |
Column Mux | Stacking arrays can effectively reduce the height of SRAM |
bit line | SRAM bit line control |
word line | SRAM word line control |
[Knock on the blackboard to draw key points]
Learn SRAM from the basic theory, understand the principle of reading and writing of SRAM, and lay the foundation for later use
References
Neil H.E. Weste • David Money Harris CMOS VLSI Design - A Circuits and Systems Perspective
TSMC TSMC N7 SRAM Compiler Databook