A brief history of the development of AI chips

As the field of artificial intelligence continues to make breakthroughs. As an important cornerstone of artificial intelligence technology, AI chips have huge industrial value and strategic position. As a key link and hardware foundation of the artificial intelligence industry chain, AI chips have extremely high barriers to technology research and development and innovation. Judging from the trend of chip development, it is still in the initial stage of AI chip development. The future will be an important stage in the development of AI chips, and there is huge room for innovation in both architecture and design concepts.

1. The development history of chips

At the 1956 Dartmouth Conference, scientists John McCarthy, Claude Shannon, and Marvin Minsky coined the term "artificial intelligence." In the late 1950s, Arthur Samuel coined the term "machine learning" when he developed a checkers program that could learn from its mistakes and, after learning, play even better than the people who wrote it .

In this era of rapid development of computer technology, the optimistic environment makes researchers believe that AI technology will "conquer" in a short time. Scientists have studied whether computing based on the functions of the human brain can solve real-life problems, creating the concept of "neural network". In 1970, Marvin Minsky said in "Life" magazine, "In the next 3 to 8 years, we will have machines with the intelligence level of ordinary people."

In the 1980s, AI went out of the laboratory and entered commercialization, setting off a wave of investment. In the late 1980s, the AI ​​technology bubble finally burst, AI returned to the field of research, and scientists continued to develop the potential of AI. Industry observers call AI a technology ahead of its time, while others say it is a future technology. After a long period of so-called "AI winter", commercial development has started again.

In 1986, Geoffrey Hinton and his colleagues published a landmark paper describing an algorithm called "back-propagation" that significantly improves multilayer Or the performance of "deep" neural networks. In 1989, Yann LeCun and other researchers at Bell Labs created a neural network that could be trained to recognize handwritten postal codes, demonstrating an important real-world application of the new technology. It took them only 3 days to train a deep learning convolutional neural network (CNN). Fast forward to 2009, and Rajat Raina, Anand Madhavan, and Andrew Ng of Stanford University published a paper explaining that modern GPUs have far more computing power than multi-core CPUs in deep learning. AI is ready to set sail again.

Thanks to the development of Moore's Law in the past two decades, sufficient computing power enables the computing performance required by artificial intelligence algorithms to be provided at an acceptable price, power consumption, and time. According to Intel's processor chip capacity and retail price comparison calculation, the computing power that can be purchased per unit price has increased by 15,000 times, so that the "general-purpose central processing unit" (CPU) can support various artificial intelligence tasks. It can be said that the time to greatly enhance the research and development of artificial intelligence through chip technology is very ripe. However, since the CPU has to be designed and optimized for hundreds of tasks, it is impossible to sacrifice flexibility to optimize for a certain type of application, so it may not be the best choice for all AI algorithms. For this reason, a variety of heterogeneous computing solutions with CPUs and dedicated chips have emerged to solve the research on computing resources and memory access bottlenecks. In addition, "brain-like" computing research, which is different from "brain-inspired" deep neural networks, has also introduced advanced neuromorphic chips to support natural learning methods with ultra-high energy efficiency.

2. The development stage of AI chip

Artificial intelligence core computing chips have also undergone four major changes.

Before 2007, artificial intelligence research and application experienced several ups and downs, and had not developed into a mature industry; at the same time, limited by factors such as algorithms and data at that time, artificial intelligence did not have a particularly strong demand for chips at this stage, and general-purpose A CPU chip can provide enough computing power. Later, due to the development of high-definition video, games and other industries, GPU products made rapid breakthroughs; at the same time, it was found that the parallel computing characteristics of GPU just meet the requirements of artificial intelligence algorithms and big data parallel computing. The efficiency of computing can be increased by 9 times to 72 times, so I began to try to use GPU for artificial intelligence calculations. After entering 2010, cloud computing has been widely promoted, and artificial intelligence researchers can use a large number of CPUs and GPUs to perform hybrid computing through cloud computing. In fact, the main computing platform for artificial intelligence today is still cloud computing. However, the artificial intelligence industry's requirements for computing power continue to increase rapidly. Therefore, after entering 2015, the industry began to develop special chips for artificial intelligence. Through better hardware and chip architecture, the computing efficiency will be further improved by 10 times. .

(1) FPGA-based semi-custom artificial intelligence chip

When the demand for chips has not yet reached a large scale, and the deep learning algorithm is not yet stable and requires continuous iterative improvement, it is the best choice to use reconfigurable FPGA chips to realize semi-custom artificial intelligence chips.

An outstanding representative of this type of chip is the domestic start-up company Shenjian Technology. The company designed a "Deep Processing Unit" (Deep Processing Unit, DPU) chip, hoping to achieve better performance than GPU with ASIC-level power consumption. Its first A number of products are based on the FPGA platform. Although this kind of semi-custom chip relies on the FPGA platform, it uses the abstraction of the instruction set and compiler, which can be quickly developed and iterated. Compared with dedicated FPGA accelerator products, it also has very obvious advantages.

(2) Fully customized artificial intelligence chips for deep learning algorithms

This type of chip is fully customized using the ASIC design method, and its performance, power consumption, and area indicators are optimized for deep learning algorithms. Google's TPU chip and the Cambrian deep learning processor chip of the Institute of Computing Technology, Chinese Academy of Sciences are typical representatives of such chips.

The Cambrian has pioneered the direction of deep learning processors internationally: Taking the Cambrian processor as an example, the Cambrian series currently includes three prototype processor structures: Cambrian No. 1 (English name DianNao, oriented to neural networks) Prototype processor structure), Cambrian 2 (English name DaDianNao, for large-scale neural networks), Cambrian 3 (English name PuDianNao, for a variety of deep learning algorithms).

The Cambrian chip is planned to be industrialized within this year: Among them, the Cambrian 2 has a main frequency of 606MHz under the 28nm process, an area of ​​67.7mm2, and a power consumption of about 16W. Its single-chip performance exceeds 21 times that of mainstream GPUs, while its energy consumption is only 1/330 of that of mainstream GPUs. The high-performance computing system composed of 64 chips can even improve performance by 450 times compared with mainstream GPUs, but its total energy consumption is only 1/150.

(3) Brain-inspired computing chips

The design purpose of this type of chip is no longer limited to just accelerating deep learning algorithms, but at the basic structure of the chip and even at the device level, it is hoped that a new brain-like computer architecture can be developed, such as using new devices such as memristors and ReRAM. Improve storage density. The research on such chips is still far from becoming a mature technology that can be widely used on a large scale in the market, and even has great risks, but in the long run, brain-inspired chips may bring about a revolution in computing systems.

A typical representative of this type of chip is IBM's Truenorh chip. The TrueNorth processor consists of 5.4 billion connected transistors that form an array containing 1 million digital neurons, which in turn communicate with each other through 256 million electrical synapses.

3. AI research and development direction

In recent years, the application scenarios of AI technology have begun to shift to mobile devices, such as automatic driving on cars and face recognition on mobile phones. The needs of the industry have contributed to the advancement of technology, and AI chips, as the foundation of the industry, must achieve stronger performance, higher efficiency, and smaller size in order to complete the transfer of AI technology from the cloud to the terminal.

At present, there are two main research and development directions of AI chips: one is FPGA (Field Programmable Gate Array) and ASIC (Application Specific Integrated Circuit) chips based on the traditional von Neumann architecture, and the other is artificial intelligence designed to imitate the structure of human brain neurons. Brain-like chips. Among them, FPGA and ASIC chips have formed a certain scale in both research and development and application; while brain-inspired chips are still in the early stages of research and development, but they have great potential and may become the mainstream in the industry in the future.

The main difference between these two development routes is that the former follows the von Neumann architecture, while the latter uses the brain-like architecture. Every computer you see uses a von Neumann architecture. Its core idea is that the processor and memory should be separated, so there is a CPU (central processing unit) and memory. The brain-like architecture, as the name suggests, mimics the neuron structure of the human brain, so the CPU, memory, and communication components are all integrated.

4. Technical characteristics and representative products of AI chips

From GPUs, to FPGAs and ASIC chips

Before 2007, due to factors such as algorithms and data at that time, AI did not have a particularly strong demand for chips, and general-purpose CPU chips could provide sufficient computing power.

Later, due to the rapid development of high-definition video and game industries, GPU (graphics processing unit) chips have achieved rapid development. Because the GPU has more logical operation units for processing data, it belongs to a high parallel structure, and it has more advantages than the CPU in processing graphics data and complex algorithms, and because AI deep learning has many model parameters, large data scale, and large amount of calculation After that, GPU replaced CPU for a period of time and became the mainstream of AI chips at that time.

GPUs have more Logical Operations Units (ALUs) than CPUs

However, after all, the GPU is just a graphics processor, not a chip dedicated to AI deep learning. Naturally, there are deficiencies. For example, when executing AI applications, the performance of its parallel structure cannot be fully utilized, resulting in high energy consumption.

At the same time, the application of AI technology is increasing day by day, and AI can be seen in fields such as education, medical care, and driverless driving. However, the high energy consumption of GPU chips cannot meet the needs of the industry, so FPGA chips and ASIC chips are used instead.

So what are the technical characteristics of these two chips? What representative products?

"Universal chip" FPGA

FPGA (FIELD-PROGRAMMABLEGATEARRAY), namely "Field Programmable Gate Array", is a product of further development on the basis of programmable devices such as PAL, GAL, and CPLD.

FPGA can be understood as a "universal chip". Users define these gate circuits and the connections between memories by burning into FPGA configuration files, and design FPGA hardware circuits with hardware description language (HDL). Every time a programming is completed, the hardware circuit inside the FPGA has a definite connection method and has certain functions. The input data only needs to pass through each gate circuit in turn to get the output result.

In plain English, a "universal chip" is a chip that has the functions you need and what functions it can have.

Despite the name "universal chip," FPGAs are not without flaws. Because of the high flexibility of the structure of the FPGA, the cost of a single chip in mass production is also higher than that of an ASIC chip, and in terms of performance, the speed and energy consumption of the FPGA chip are also compromised compared with the ASIC chip.

In other words, although the "universal chip" is a "all-rounder", its performance is not as good as that of an ASIC chip, and its price is also higher than that of an ASIC chip.

However, when the demand for chips has not yet reached a large scale and deep learning algorithms need to be continuously improved iteratively, FPGA chips with reconfigurable features are more adaptable. Therefore, using FPGA to implement semi-custom artificial intelligence chips is undoubtedly an insurance choice.

At present, the FPGA chip market is divided by American manufacturers Xilinx and Altera. According to the statistics of the foreign media Marketwatch, the former accounts for 50% of the global market share, and the latter accounts for about 35%. The two manufacturers occupy 85% of the market share and have more than 6,000 patents.

Xilinx's FPGA chips are divided into four series from low-end to high-end, namely Spartan, Artix, Kintex, and Vertex, and the chip technology also varies from 45 to 16 nanometers. The higher the level of chip technology, the smaller the chip. Among them, Spartan and Artix are mainly aimed at the civilian market, and their applications include unmanned driving, smart home, etc.; Kintex and Vertex are mainly aimed at the military market, and their applications include national defense, aerospace, etc.

Xilinx's Spartan series FPGA chip

Let's talk about Xilinx's old rival Altera. Altera's mainstream FPGA chips are divided into two categories, one focuses on low-cost applications, medium capacity, and performance can meet general application requirements, such as Cyclone and MAX series; the other focuses on high-performance applications, large capacity, and performance can meet Various high-end applications, such as Startix and Arria series. Altera's FPGA chips are mainly used in consumer electronics, wireless communications, military aviation and other fields.

ASIC

Before the large-scale rise of AI industry applications, using general-purpose chips such as FPGAs suitable for parallel computing to achieve acceleration can avoid the high investment and risk of developing custom chips such as ASICs.

But as we just said, since the original design of general-purpose chips is not specifically for deep learning, FPGAs inevitably have bottlenecks in terms of performance and power consumption. As the scale of artificial intelligence applications expands, such problems will become increasingly prominent. In other words, all our good ideas about artificial intelligence require chips to catch up with the rapid development of artificial intelligence. If the chip cannot keep up, it will become a bottleneck in the development of artificial intelligence.

Therefore, with the rapid development of artificial intelligence algorithms and applications in recent years, as well as the gradual maturity of research and development results and technology, ASIC chips are becoming the mainstream of artificial intelligence computing chip development.

ASIC chips are dedicated chips customized for specific needs. Although versatility is sacrificed, ASIC has advantages over FPGA and GPU chips in terms of performance, power consumption, and volume, especially in mobile devices that require chips with high performance, low power consumption, and small size. For example, our mobile phones.

However, because of its low versatility, the high R&D cost of ASIC chips may also bring high risks. However, if market factors are considered, ASIC chips are actually the general trend of the industry.

Why do you say that? Because from servers and computers to driverless cars, drones, and various home appliances in smart homes, a large number of devices need to introduce artificial intelligence computing capabilities and perception and interaction capabilities. Due to real-time requirements and training data privacy considerations, these capabilities cannot be completely dependent on the cloud, and must be supported by local software and hardware infrastructure platforms. The characteristics of high performance, low power consumption and small size of ASIC chip can meet these requirements.

Guess you like

Origin blog.csdn.net/leyang0910/article/details/130089262