ChatGPT: Integrated storage and calculation, the next level of computing power

ChatGPT has started a computing power arms race. Large model parameters have shown an exponential scale, triggering a demand for massive computing power. The growth rate of model calculations far exceeds the growth rate of artificial intelligence hardware computing power, and it also puts forward higher requirements for data transmission speed. XPU, memory, and hard disk form a complete von Neumann system. Taking a general server as an example, the cost of chipset + storage accounts for more than 70%. The chipset, internal storage, and external storage are the core components; storage is the core component of a computer. An important component of the structure, "memory" is actually the middleman between the hard disk and the CPU. Storage can be classified into two parts: ROM and RAM according to the medium.

We have proven in "ChatGPT: Baidu Wenxin Yiyan Thinking" that data, platform, and computing power are the necessary foundations for building a large model ecosystem, and computing power is the underlying source of power for training large models. An excellent computing power base is The training and reasoning of large models (AI algorithms) have efficiency advantages; at the same time, we proved in "ChatGPT Launches AI Computing Power "Arms War"" that computing power is the "ticket" for AI technology competition, including AI servers, AI chips, etc. As the core product; in addition, we also proved in "ChatGPT, NVIDIA DGX detonates AI "nuclear fusion"" that technology companies represented by NVIDIA are rapidly filling the global demand for AI computing power and adding necessary "fuel" to large models.

The number of large model parameters and the scale of training data are growing rapidly

The growth trend of parameter size of large models in recent years

In the past two decades, computing power has developed much faster than storage, and the "storage wall" has become a generational challenge in the era of accelerated learning.   

Because in the AI ​​era, storage bandwidth restricts the effective bandwidth of computing systems, and chip computing power is struggling to grow. Therefore, the integration of storage and computing is expected to break the von Neumann architecture and is an inevitable choice in the AI ​​era. The integration of storage and computing means that data storage and computing are integrated in the same area of ​​​​the same chip, which is extremely suitable for large-scale parallel applications with large amounts of data. Scenes. The integration of storage and computing has significant advantages and is known as the "all-round warrior" of AI chips. It has the advantages of high energy consumption, low cost, and high computing power. The integration of storage and computing is divided into digital computing and analog computing according to the calculation method, and has a wide range of application scenarios. SRAM and RRAM are expected to become mainstream media for cloud storage and computing.

The demand for storage and computing integration is strong, which is expected to promote the development of artificial intelligence in the next stage. The reason is that we believe that the current storage and computing integration mainly requires AI computing power, parallel computing, neural network computing, etc.; with the rise of large models, storage and computing integration is suitable for cloud computing. In terms of all kinds of end-to-end calculations and end-to-end testing, artificial intelligence pays more attention to timely response, that is, "input" means "output". At present, the integration of storage and calculation can complete high-precision calculations; on the cloud side, with the emergence of large models, parameters It has reached the level of hundreds of millions, and the integration of storage and computing is expected to become a new generation of computing power factor; the integration of storage and computing is suitable for various artificial intelligence scenarios, such as wearable devices, mobile terminals, smart driving, data centers, etc. We believe that the integration of storage and computing is the next generation technology trend and is expected to be widely used in artificial intelligence neural network-related applications, integration of sensing, storage and computing, multi-modal artificial intelligence computing, brain-like computing and other scenarios.

CPU server disassembly

Server composition: Let’s take a general server as an example. The server is mainly composed of the main server

It is composed of hardware devices such as boards, memory, chipsets, disks, network cards, graphics cards, power supplies, and mainframe chassis; among them, chipsets, internal storage, and external storage are the core components.

H3C UniServer R4900 G5 server hardware structure disassembly

GPU server has significant advantages: The powerful computing function of GPU server can be applied to massive data processing operations, such as searching for big data recommendations, intelligent input methods, etc. Compared with Compared with general-purpose servers, it has exponential efficiency advantages in terms of data volume and calculation volume. In addition, GPU can be used as a training platform for deep learning. The advantages are: 1. GPU server can directly accelerate computing services and can also directly connect and communicate with the outside world; 2. GPU server and cloud server are used together, with cloud server being the main one and GPU server being responsible for providing Computing platform; 3. Object storage COS can provide large data cloud storage services for GPU servers.

The value and cost of AI server chipsets are highlighted: 

Taking a general server as an example, the motherboard or chipset accounts for the highest proportion, accounting for more than 50% of the cost, and memory (internal storage + external storage) accounts for about 20%. In addition, according to data from Wind and Xinyu, the AI ​​server

Compared with high-performance servers and basic servers, the price of chipsets (CPU+GPU) is often higher. The cost of AI server (training) chipsets accounts for as much as 83%, and the cost of AI server (inference) chipsets accounts for 50%. , much higher than the proportion of general server chipsets.

editissue

name

serial number

name

3

Center GPU module

12

Memory

5、6

network card

13

motherboard

7

riser card

18

Power module

8

GPU card

23

harddisk

9

storage control card

25

Super capacitor

10

CPU

27

encryption module

12

Memory

28

System battery

H3C UniServer R4900 G5 server hardware structure notes

Storage is an important component of a computer: Memory is a component used to store programs and data. For computers, only with memory can they have memory functions and ensure normal operation. . Memory can be divided into main memory and auxiliary memory according to its purpose. Main memory is also called internal memory (referred to as memory), and auxiliary memory is also called external memory (referred to as external memory).

Memory: The storage structure on the motherboard communicates directly with the CPU and uses its data storage components to store the data and programs currently being used (i.e., being executed). Once If the power is cut off, the programs and data in it will be lost;

External storage: Magnetic media or optical disks, such as hard drives, floppy disks, CDs, etc., can store information for a long time and do not rely on electricity to store information.

XPU, memory, and hard disk form a complete von Neumann system: "Memory" is actually the middleman between the hard disk and the CPU. If the CPU directly grabs data from the hard disk, It will be too long. Therefore, "memory" acts as a middleman, extracting data from the hard disk, and then letting the CPU directly access the data to perform calculations in the memory. This will be a million times faster than grabbing data directly from the hard disk; there is a storage space Register in the CPU. During operation, the CPU will load the data from the memory into the Register, and then let the numbers stored in the Register perform operations. After that, the result is stored back into the memory. Therefore, the computing speed is Register > Memory > Hard Disk. The faster the speed, the higher the price and the lower the capacity.

Storage is classified according to volatility: ROM (read-only memory) is the abbreviation of Read Only Memory, and RAM (random access memory) is the abbreviation of Random Access Memory. ROM can still retain data when the system is powered off, while RAM usually loses data after a power outage. A typical RAM is the computer's memory.

RAM (Random Access Memory) is widely used in computers as a memory architecture:It is an internal memory that directly exchanges data with the central processing unit. It can be read and written at any time and is very fast. It is usually used as a temporary data storage medium for the operating system or other running programs. RAM can be divided into static SRAM and dynamic DRAM. SRAM is very fast and is currently the fastest storage device for reading and writing. However, it is expensive, so it is only used in places with very demanding requirements, such as the first-level buffer and the second-level buffer of the CPU. ;DRAM retains data for a short time and is slower than SRAM, but faster than any ROM. However, in terms of price, DRAM is cheaper than SRAM, so most computer memory is DRAM architecture;

ROM (read-only memory) is widely used as hard disk media: The storage characteristics of Flash memory are equivalent to those of a hard disk. It combines the advantages of ROM and RAM and is not only electronically erasable but also In addition to programmable performance, data will not be lost due to power outages and data can be read quickly. In recent years, Flash has completely replaced the positioning of traditional ROM in embedded systems. Currently, there are two main types of Flash, NOR Flash and NAND Flash. Nand-flash memory has the advantages of large capacity and fast rewriting speed, and is suitable for storing large amounts of data. Therefore, it is widely used in various memory cards, U disks, SSD, eMMC and other large-capacity devices; NOR-Flash It is characterized by on-chip execution, so it is used in many consumer electronics fields.

Compare

Dynamic RAM(DRAM)

Static RAM (SRAM)

Storage principle

How capacitors store charge

trigger way

Integration

high

Low

chip pin

few

many

Power consumption

Small

big

price

Low

high

speed

slow

piece

refresh

have

none

NOR Flash array structure

Programs and data in NOR Flash can be stored on the same chip. It has an independent data bus and address bus, which can quickly and randomly read data. It also allows the system to read code directly from Flash for execution without first downloading the code to RAM. Re-execute; single-byte or single-word programming can be performed, but single-byte erasing cannot be performed. The erase operation must be performed in blocks or on the entire chip. The block or the entire chip needs to be preprogrammed before reprogramming the memory. and.

NOR Flash connects memory cells in parallel, has separate control lines, address lines and data lines, fast read speed, and can provide on-chip execution functions, but the writing and erasing operations take a long time and the capacity is low , the price is high. Therefore, NOR Flash is mostly used in mobile phones, BIOS chips and embedded systems for code storage.

Take the voltage-type analog multiplier based on Flash unit as an example. The analog multiplier is composed of two Flash units, The gates (G) of these two Flash tubes are connected in parallel Fixed voltage, drain (D) phase connection voltage VDS, source The current of the pole (S) is subtracted to the output current ID. The external input data /span>Two Flash tubes working in the linear region can be used to implement analog multiplication.  . The output current obtained is converted into a digital signal output by ADC (Analog to Digital Converter, analog-to-digital converter) ,


Voltage analog multiplier structure diagram

In traditional computer architecture, storage and computing are separated, and the storage unit serves the computing unit, so the priority of the two is considered. Now, with the advent of the era of massive data and AI acceleration, we have to consider the best way to collect and transmit data. , processing services, however, the storage wall, bandwidth wall and power consumption wall have become the primary challenges. Although multi-core parallel acceleration technology can also improve computing power, in the post-Moore era, storage bandwidth has restricted the effective bandwidth of the computing system, and the growth of chip computing power has been difficult. .

The integration of storage and calculation is to embed computing capabilities in the memory and use a new computing architecture to perform two-dimensional and three-dimensional matrix multiplication/addition operations. The advantage of in-memory computing and in-memory logic, that is, integrated storage and computing technology, is that it can directly use the memory for data processing or calculation, thereby integrating data storage and calculation in the same area of ​​​​the same chip, which can completely eliminate von Neumann computing Architectural bottlenecks are particularly suitable for large-scale parallel application scenarios such as deep learning neural networks with large amounts of data.

Computing power develops faster than memory

                      

Storage wall bottleneck

In recent years, storage and computing integration has developed rapidly with the drive of artificial intelligence: With the breakthrough of semiconductor manufacturing technology and the rise of computing power-intensive application scenarios such as AI, new manufacturing platforms and industry drivers have been provided for storage and computing integration technology. force. In 2016, a team from the University of California proposed using RRAM to build a deep learning neural network (PRIME) with an integrated storage and computing architecture. Compared with the traditional solution of the traditional von Neumann architecture, PRIME can reduce power consumption by about 20 times and increase speed by about 50 times. In addition, in 2017, NVIDIA, Microsoft, Samsung, etc. proposed an integrated storage and computing prototype; in the same year, Zhicun The first batch of domestic storage and computing integrated chip companies such as Technology were established; domestic storage and computing integrated chip companies began to join the market, such as Qianxin Technology, Zhixin Micro, Pingxin Technology, Houmo Intelligence, etc.

The integration of storage and calculation is divided into digital calculation and analog calculation according to the calculation method:

Analog calculation: Analog storage and calculation integration usually uses non-volatile media such as FLASH, RRAM, and PRAM as storage devices. It has high storage density and high parallelism, but is susceptible to environmental noise. and very sensitive to temperature. The weights of the integrated analog storage and calculation model are kept in the memory, and the input data flows into the memory to implement analog multiplication and addition calculations based on current or voltage, and the peripheral circuit implements analog-to-digital conversion of the output data. Since the integrated analog storage and computing architecture can achieve low-power, low-bit-width integer multiplication and addition calculations, it is very suitable for edge AI scenarios.

Digital Computing: As the complexity and application scope of AI tasks increase, high-precision large-scale AI models continue to emerge. These models need to complete training and inference in cloud AI scenarios such as data centers, resulting in huge computing power requirements. Compared with edge AI scenarios, cloud AI scenarios have more diverse task requirements, so cloud AI chips must take into account both energy efficiency and accuracy. , flexibility and other aspects to ensure various large-scale AI reasoning and training; digital storage and calculation mainly use SRAM and RRAM as storage devices, using advanced logic technology, with the advantages of high performance and high precision, and good anti-noise ability and reliability, so it is more suitable for commercial scenarios with large computing power and high energy efficiency in the cloud.

The application scenarios of integrated storage and computing are extremely broad: small computing power scenarios on the device side, the computing power range is about 16TOPS to 100TOPS, such as smart wearable devices, smart security, mobile terminals, AR\VR, etc. In large computing power scenarios, the computing power prelude is above 1000TOPS, such as cloud computing data centers, autonomous driving, GPT-4 and other large models, etc. We believe that cloud and edge computing power scenarios are the advantageous areas of integrated storage and computing chips. Storage and computing integration has more core influence and competitiveness; Zhicun Technology is currently the only company in the memory computing track that has landed in the field of intelligent commercialization. In-memory computing company.

The integration of storage and calculation has broad prospects and is getting better and better

The rise of "big models" such as ChatGPT is essentially calculations such as neural networks and deep learning. Therefore, we believe that there is a strong demand for computing power. In terms of end testing, artificial intelligence pays more attention to timely response, that is, "input" means "output". At the same time, , with the development of integrated storage and computing, in-memory computing and in-memory logic can now complete high-precision calculations; on the cloud side, with the emergence of large models, parameters have reached hundreds of millions, so the energy consumption of computing power The assessment in terms of computing is more stringent. As technologies such as SRAM and PRAM further mature, the integration of storage and computing is expected to become a new generation of computing power factors, thus promoting the development of the artificial intelligence industry.

Cloud testing and edge testing: Large computing power equipment for cloud computing and edge computing is the advantage area of ​​integrated storage and computing chips. We believe that the integration of storage and computing itself has extremely high applicability in cloud and edge testing because it has the advantages of high computing power, low power consumption, and high cost performance; especially for large computing power application scenarios such as intelligent driving and data centers. exist

There are high requirements in terms of reliability and computing power. In addition, the cloud computing market has relatively concentrated players. Therefore, we believe that the integration of storage and computing is expected to be implemented in the cloud computing market before the end-side market.

We believe that the integration of storage and computing is clearly the next generation technology trend: At present, the integration of storage and computing is in its initial stage at home and abroad, and the corresponding gap is not large. This is because the chip design level is innovative; the integration of storage and computing is the integration of computing systems and storage systems. The design is more complex than analog IP and storage IP, and relies on the experience accumulated from multiple memory tape-outs. Therefore, we believe that companies with advantages and experience in storage have first-mover advantages;

At present, the industry is mainly divided into two paths: small computing power scenarios, such as audio and terminal application scenarios; and large computing power scenarios, such as cloud computing, intelligent driving, robots and other fields;

The integration of storage and calculation is an inevitable choice for the development of artificial intelligence. The integration of storage and calculation has the advantage of energy consumption, and its use cost can be greatly reduced. The weight part of a large number of multiplication and addition calculations with AI calculation can be stored in the storage unit so that it can be read at the same time. Perform data input and calculation processing to complete the convolution operation. Integration of senses, storage and calculation, multi-modal artificial intelligence computing, integration of storage and calculation is the key technical cornerstone of brain-like computing.


source:


1】 Semiconductor industry aspect: https://xueqiu.com/3261990793/224984615

【2】H3C UniServer R4900 G5 Technical White Paper, West China Securities Research Institute 

Guess you like

Origin blog.csdn.net/m0_58966968/article/details/135022962