FPGA Technology Analysis

FPGA Technology Analysis
FPGA (Field Programmable Gate Array) is a product of further development on the basis of programmable devices such as PAL (Programmable Array Logic) and GAL (General Array Logic). It emerged as a semi-custom circuit in the field of application-specific integrated circuits (ASIC), which not only solved the shortcomings of the custom circuit, but also overcome the shortcomings of the limited number of gate circuits of the original programmable device.
FPGA design is not a simple chip research, but mainly uses the FPGA model to design products in other industries. Different from ASIC, FPGA is widely used in the communication industry. Through the analysis of the global FPGA product market and related suppliers, combined with the current actual situation in my country and the leading FPGA products in China, the future development direction of related technologies can be found, which has a very important role in promoting the overall improvement of my country's scientific and technological level.
Compared with the traditional mode of chip design, FPGA chips are not simply limited to research and design chips, but can be optimized for products in many fields with the help of specific chip models. From the perspective of chip devices, the FPGA itself constitutes a typical integrated circuit in a semi-custom circuit, which contains digital management modules, embedded units, output units, and input units. On this basis, it is necessary to focus on the comprehensive chip optimization design of FPGA chips, and add new chip functions by improving the current chip design, thereby realizing the simplification of the overall structure of the chip and the improvement of performance.
insert image description here

FPGAs can be used to handle multiple computing-intensive tasks. Relying on the pipeline parallel structure system, FPGAs have technical advantages over GPUs and CPUs in terms of return delay of calculation results.
Computation-intensive tasks: Matrix operations, machine vision, image processing, search engine sorting, asymmetric encryption and other types of operations are computationally intensive tasks. Such computing tasks can be offloaded from the CPU to the FPGA for execution.
FPGA performs computationally intensive tasks:
• Computational performance relative to CPU: For example, Stratix series FPGAs perform integer multiplication operations, and their performance is equivalent to a 20-core CPU, and for floating-point multiplication operations, their performance is equivalent to an 8-core CPU.
• Computational performance relative to GPU: FPGA performs integer multiplication and floating-point multiplication operations, and the performance is orders of magnitude lower than that of GPU. The computing performance of GPU can be approached by configuring multipliers and floating-point operation components.
The core advantage of FPGA for performing computation-intensive tasks: tasks such as search engine sorting and image processing have strict requirements on the return time limit of results, and it is necessary to reduce the delay of computing steps. Under the traditional GPU acceleration scheme, the data packet size is large, and the delay can reach the millisecond level. Under the FPGA acceleration scheme, the PCIe latency can be reduced to the microsecond level. Driven by long-term technology, the data transmission delay between CPU and FPGA can be reduced to less than 100 nanoseconds.
The FPGA can build the same number of pipelines (pipeline parallel structure) for the number of data packet steps, and the data packets can be output immediately after being processed by multiple pipelines. The GPU data parallel mode relies on different data units to process different data packets, and the data units need to be input and output consistently. For stream computing tasks, the FPGA pipeline parallel structure has a natural advantage in latency.
insert image description here

FPGA is used to process communication-intensive tasks and is not limited by network cards. It outperforms CPU solutions in terms of packet throughput and delay, and has strong delay stability.
Communication-intensive tasks: Operations such as symmetric encryption, firewalls, and network virtualization are communication-intensive computing tasks. Communication-intensive data processing is less complex than computationally-intensive data processing, and is easily limited by communication hardware devices.
Advantages of FPGA for performing communication-intensive tasks:
① Throughput advantage: The CPU solution needs to receive data through the network card to process communication-intensive tasks, which is easily limited by the performance of the network card (the network card for processing 64-byte packets at wire speed is limited, and the number of CPU and motherboard PCIe network card slots limited). The GPU solution (high computing performance) lacks network ports to process data packets of communication-intensive tasks, and needs to rely on the network card to collect data packets. The data throughput is limited by the CPU and the network card, and the delay is long. The FPGA can be connected to 40Gbps and 100Gbps network cables, and can process various data packets at line speed, which can reduce the configuration cost of network cards and switches.
② Delay advantage: The CPU solution collects data packets through the network card and sends the calculation results to the network card. Limited by the performance of the network card, under the DPDK packet processing framework, the CPU processing communication-intensive tasks has a delay of nearly 5 microseconds, and the stability of the CPU delay is weak. certainty. FPGA does not require instructions, which can ensure stability and extremely low latency. FPGA cooperating with CPU heterogeneous mode can expand the application of FPGA solution in complex end devices.
insert image description here

FPGA deployment includes clustered, distributed, etc., and gradually transitions from centralized to distributed. Under different deployment methods, server communication efficiency and fault conduction effects are different.
FPGA embedded power consumption burden: FPGA embedded has little impact on the overall power consumption of the server. Taking the FPGA accelerated machine translation project jointly carried out by Catapult and Microsoft as an example, the total computing power of the acceleration module reaches 103Tops/W, which is equivalent to the computing power of 100,000 GPUs. . In contrast, embedding a single FPGA leads to an increase in the overall power consumption of the server by about 30W.
Features and limitations of FPGA deployment methods:
① Features and limitations of cluster deployment: FPGA chips form a dedicated cluster to form a supercomputer composed of FPGA acceleration cards (for example, early Virtex series experimental boards deploy 6 FPGAs on the same silicon chip, and a unit server carries 4 FPGAs) experiment board).
• The dedicated cluster mode cannot realize communication between FPGAs of different machines;
• Other machines in the data center need to send tasks to the FPGA cluster in a centralized manner, which may cause network delays;
• The single point of failure causes the overall acceleration capability of the data center to be limited
. ② Distributed deployment of network cable connections: In order to ensure the homogeneity of servers in the data center (which cannot be satisfied by ASIC solutions), this deployment solution embeds FPGAs in different servers and connects them through a dedicated network, which can solve problems such as single point of failure conduction and network delay.
• Similar to the cluster deployment mode, this mode does not support communication between FPGAs of different machines;
• Servers equipped with FPGA chips are highly customized, and the operation and maintenance costs are high
③ Shared server network deployment: In this deployment mode, the FPGA is placed on the network card , between switches, which can greatly improve and accelerate network functions and realize storage virtualization. The FPGA sets a virtual network card for each virtual machine, and the data plane function of the virtual switch is moved to the FPGA, without requiring the CPU or physical network card to participate in the process of sending and receiving network packets. This solution significantly improves the virtual machine network performance (25Gbps), while reducing the data transmission network latency (10 times).
insert image description here

In the shared server network deployment mode, the FPGA accelerator helps to reduce data transmission delay, maintain stable data center delay, and significantly improve virtual machine network performance.
FPGA-accelerated Bing search sorting in shared server network deployment mode: Bing search sorting uses 10Gbps dedicated network cable communication in this mode, and each network group consists of 8 FPGAs. Among them, some are responsible for extracting signal features, some are responsible for calculating feature expressions, and some are responsible for calculating document scores, and finally form a Robot-as-a-Service (RaaS) platform. Under the FPGA acceleration scheme, the Bing search delay is greatly reduced, and the delay stability presents a normal distribution. In this deployment mode, the remote FPGA communication delay is negligible relative to the search delay.
insert image description here

Azure server deployment FPGA mode: Azure adopts the FPGA sharing server network deployment mode for problems such as high network and storage virtualization costs. As network computing speeds reach 40Gbps, network and storage virtualization CPU costs skyrocket (unit CPU cores can only handle 100Mbps throughput). By deploying FPGAs between network cards and switches, network connectivity is extended throughout the data center. Through the lightweight transmission layer, the delay of the same server rack can be controlled within 3 microseconds, and the delay of reaching all FPGA racks in the same data center can be controlled within 20 microseconds.
Relying on the advantages of high bandwidth and low latency, FPGA can form a data center acceleration layer between the network switching layer and server software, and achieve super-linear performance improvement as the scale of distributed accelerators expands.
Data center acceleration layer: The FPGA is embedded in the data center acceleration plane, located between the network switching layer (the support layer, the first layer, the second layer) and the traditional server software (the software running at the CPU layer).
Advantages of the acceleration layer:
• The FPGA acceleration layer is responsible for providing network acceleration and storage virtualization acceleration support for each server (providing cloud services). The remaining resources of the acceleration layer can be used for computing tasks such as deep neural networks (DNN).
• As the scale of the FPGA accelerator expands in the distributed network mode, the performance improvement of the virtual network exhibits super-linear characteristics.
Acceleration layer performance improvement principle: When using a single FPGA, the single-chip silicon memory is not enough to support the full model computing task, and it is necessary to continuously access DRAM to obtain weights, which is subject to DRAM performance. The acceleration layer supports the single-layer or single-layer partial computing tasks of the virtual network model through a large number of FPGAs. In this mode, the silicon chip memory is fully loaded with model weights, which can break through the DRAM performance bottleneck and give full play to the FPGA computing performance. The acceleration layer needs to avoid excessive splitting of computing tasks, resulting in unbalanced computing and communication.
insert image description here

Embedded eFPGA technology is superior to traditional FPGA embedded solutions in performance, cost, power consumption, profitability, etc., and can provide flexible solutions for different application scenarios and different market segments.
Driving factors of eFPGA technology: design complexity increases accompanying equipment The economic trend of falling costs drives the market demand for eFPGA technology.
Increased device design complexity: Software tools related to the SoC design and implementation process tend to be more complex (for example, Imagination Technologies provides PowerVR graphical interface and Eclipse integrated development environment to meet customers' needs for complete development solutions), and engineering time increases (compile time, synthesis time) , mapping time, the larger the FPGA scale, the longer the compilation time), and the cost of molding is increased (the cost of an FPGA chip is 100 times that of an ASIC chip of the same specification).
The unit function cost of equipment continues to decline: At the end of the 20th century, the average selling price of FPGA was high (over 1,000 yuan). In the traditional mode, the integrated design of FPGA and ASIC led to an increase in the die area, size and complexity of ASIC chips. Early hybrid equipment higher cost. In the 21st century, compared with mass-produced hybrid devices, FPGAs are more used in prototyping and pre-production designs. Compared with traditional integration, the cost continues to decrease (the minimum is about 100 yuan), and the application is flexible. Advantages of eFPGA technology:
Better quality: Compared with traditional FPGA embedded ASIC solutions, the SoC design of eFPGA IP cores and other functional modules has better performance in terms of power consumption, performance, size, and cost.
More convenient: The market demand for downstream applications changes quickly, and the reprogrammable feature of eFPGA helps design engineers to update SoCs, so that products can occupy the market for a longer time, and profits, revenue, and profitability are greatly improved at the same time. The SoC under the eFPGA solution can achieve efficient operation. On the one hand, it can be quickly updated and upgraded to support new interface standards, and on the other hand, it can quickly access new functions to meet the needs of the segmented market.
More Power Efficient: Embedding eFPGA technology into SoC designs can improve overall performance while reducing overall power consumption. Using the reprogrammability of eFPGA technology, engineers can reconfigure solutions based on hardware for specific problems, thereby improving design performance and reducing power consumption.
insert image description here

FPGA technology does not need to rely on instructions or shared memory, and provides low-latency streaming communication functions in cloud computing network interconnection systems, which can widely meet the needs of acceleration between virtual machines and between processes.
FPGA cloud computing task execution flow: mainstream data centers Taking the FPGA as the accelerator card for computing-intensive tasks, Xilinx and Altera have launched a high-level programming model based on OpenCL. The model relies on the CPU to reach DRAM, transfer tasks to the FPGA, and notify the execution. The FPGA completes the calculation and transmits the execution result to the FPGA. DRAM, and finally transferred to the CPU.
FPGA cloud computing performance upgrade space: Limited by the ability of engineering implementation, the communication between FPGA and CPU in the current data center is mostly mediated by DRAM, and the communication is completed through the process of programming DRAM, starting the kernel, and reading DRAM (FPGA DRAM is relative to CPU DRAM data) The transmission speed is slower), and the delay is nearly 2 milliseconds (OpenCL, shared memory between multiple kernels). There is room for improvement in the communication delay between the CPU and FPGA. PCIe DMA can be used to achieve efficient direct communication, and the minimum delay can be reduced to 1 microsecond.
New mode of FPGA cloud computing communication scheduling: In the new communication mode, the FPGA and CPU do not need to rely on the shared memory structure, and can realize high-speed communication between the intelligent mobility unit and the host software through the pipeline. Cloud computing data center tasks are relatively single and highly repetitive, mainly including virtual platform network construction and storage (communication tasks), as well as machine learning, symmetric and asymmetric encryption and decryption (computing tasks), and the algorithms are relatively complex. In the new scheduling mode, CPU computing tasks tend to be fragmented. In the long-term cloud platform computing centers may be dominated by FPGAs, and complex computing tasks will be offloaded to the CPU through the FPGA (different from the traditional mode in which the CPU offloads tasks to the FPGA).
insert image description here

The global FPGA market is monopolized by the four giants Xilinx, Intel (acquiring Altra), Lattice, and Microsemi. The four major manufacturers monopolize more than 9,000 patented technologies and hold the "air dominance" of the industry.
Since the formation of the FPGA chip industry, more than 70 companies around the world have participated in the competition, and new startups have emerged one after another (such as Achronix Semiconductor, MathStar, etc.). Product innovation provides momentum for the development of the industry. In addition to traditional programmable logic devices (pure digital logic nature), the innovation speed of new programmable logic devices (mixing nature, analog nature) is accelerated. For example, Cypress Semiconductor has developed a configurable mixed signal. Circuit PSoC (Programmable System on Chip), another example is Actel's launch of Fusion (Programmable Mixing Chip). In addition, some start-ups have launched Field Programmable Analog Array (FPAA), etc.
With the evolution of intelligent market demand, highly customized chips (SoC ASIC) have increased market risks due to the large scale of non-repetitive investment and long R&D cycle. Relatively speaking, FPGA has advantages in the field of parallel computing tasks, and can replace some ASICs in the field of high performance and multi-channel. The demand for multi-channel computing tasks in the field of artificial intelligence drives the evolution of FPGA technology to the mainstream.
Based on the advantages of FPGA chips in the field of small batches (50,000 tapes are the limit) and multi-channel computing special equipment (radar, aerospace equipment), some downstream application markets replace ASIC application solutions with FPGAs.
insert image description here

Chinese FPGA chip R&D companies can take Ziguang Tongchuang, China Microelectronics, Chengdu Huawei Electronics, Anlu Technology, Zhiduo, Gowin Semiconductor, Shanghai Fudan Microelectronics and Jingwei Qili as examples. From a product perspective, China's FPGA hardware performance indicators are far from Xilinx and Intel. Tsinghua Unigroup is currently the only company in the Chinese market that has the capability to develop and manufacture high-performance FPGAs with independent property rights of tens of millions of gates. Shanghai Fudan Microelectronics launched 100 million gate FPGA products with independent intellectual property rights in May 2018. Chinese FPGA companies keep up with the pace of big manufacturers, lay out artificial intelligence, autonomous driving and other markets, and create high-end, mid-range and low-end complete product lines.
Competitive breakthroughs of Chinese FPGA companies At this stage, Chinese FPGA manufacturers' chip design software and application software are not unified, which is easy to waste resources on the client side. Leading manufacturers can take the lead in concentrating industrial chain resources and improve the overall competitiveness of the industry.
Advantages and disadvantages
Advantages
The advantages of FPGA are as follows:
(1) FPGA is composed of hardware resources such as logic unit, RAM, multiplier, etc. By rationally organizing these hardware resources, hardware circuits such as multipliers, registers, and address generators can be realized.
(2) FPGA can be designed by using block diagram or Verilog HDL, from simple gate circuit to FIR or FFT circuit.
(3) The FPGA can be reprogrammed infinitely, and it only takes a few hundred milliseconds to load a new design scheme, and the hardware overhead can be reduced by reconfiguration.
(4) The operating frequency of the FPGA is determined by the FPGA chip and design, and some harsh requirements can be met by modifying the design or replacing a faster chip (of course, the operating frequency is not unlimited and can be increased, but is affected by the current IC technology and other factors).
Disadvantages
The disadvantages of FPGA are as follows:
(1) All functions of FPGA are realized by hardware, and operations such as branch conditional jump cannot be realized.
(2) FPGA can only implement fixed-point operations.
Summary: FPGA relies on hardware to implement all functions, and its speed can be compared with special-purpose chips, but its design flexibility is far behind that of general-purpose processors.
Design language and platform
Editing and broadcasting
Programmable logic device is a hardware carrier that realizes the established functions and technical indicators of electronic application system through EDA technology. FPGA is one of the mainstream devices to realize this approach. Great versatility, easy to use, fast hardware testing and implementation.
Hardware Description Language (HDL) is a language used to design digital logic systems and describe digital circuits. The commonly used languages ​​are VHDL, Verilog HDL, System Verilog and System C.
As a comprehensive hardware description language, VHDL for high-speed integrated circuits is independent of specific hardware circuits and design platforms. It has a wide range of description capabilities, does not depend on specific devices, and can convert complex The design of control logic is described by rigorous and concise code, which has been supported by many EDA companies and has been widely used in the field of electronic design.
VHDL is a high-level language used for circuit design. Compared with other hardware description languages, it has the characteristics of concise language, strong flexibility, and no dependence on device design, making it a general hardware description language for EDA technology. Technology is easier for designers to master.
Verilog HDL is a widely used hardware description language, which can be used in multiple stages such as modeling, synthesis, and simulation of the hardware design process.
Verilog HDL Advantages: Similar to C language, easy to use and flexible. Case Sensitive. Advantages in writing incentives and modeling. Disadvantage: Many errors cannot be found at compile time.
Advantages of VHDL: Rigorous syntax and clear hierarchy. Disadvantages: long familiarity time, not flexible enough.
Quartus_Ⅱ software is a complete multi-platform design environment developed by Altera Corporation, which can meet the design needs of various FPGAs and CPLDs, and is a comprehensive environment for the design of programmable systems on a chip.
The Vivado Design Suite is an integrated design environment released by FPGA manufacturer Xilinx in 2012. Includes a highly integrated design environment and a new generation of system-to-IC-level tools built on a shared extensible data model and common debug environment. The FIFO IP cores are available in the Xilinx Vivado Design Suite for easy use in designs.

Reference link
https://mp.weixin.qq.com/s/uGBYtdHM1jkGNhGDTzbizg
https://baike.baidu.com/item/FPGA/935826?fr=aladdin

Guess you like

Origin blog.csdn.net/wujianing_110117/article/details/123343911