Amazon Cloud Technology's self-developed chip improves cost performance for enterprise cloud services

bcd5cb485d844a4f8a5aaa2e7730c1b0.pngFrom June 27th to 28th, the 2023 Amazon Cloud Technology China Summit was successfully held in Shanghai. At this summit, it seems that the reason why Amazon cloud technology has been able to maintain its leading position as competitors in the field of cloud computing are becoming more and more mature-in the past ten years, Amazon cloud technology "based on customer needs, quickly develop products The Day one concept of "update and technology iteration" has been constantly pursuing innovation at the infrastructure level. 

Enterprises urgently need to improve the cost performance of "using the cloud"

With the increasing demand for digital transformation of enterprises and the intensification of market competition, enterprises need to migrate business and data to the cloud to achieve more efficient production and services to adapt to market changes. Therefore, it can be observed that more and more enterprises are beginning to migrate to the cloud. However, the threshold for migrating to the cloud is not low, and there are many things to consider, such as technical capabilities, security compliance risks, costs, user experience, etc. Many corporate CTOs said that , "I want to go to the cloud, but I don't have enough energy", most companies currently put forward the urgent need to lower the threshold for using cloud services and improve the cost-effectiveness of "using the cloud".

In order to improve the cost-effectiveness of enterprise cloud services, Amazon Cloud Technology provides users with comprehensive and in-depth computing power support, including Intel, AMD, Nvidia and self-developed CPU and accelerator chip products, the most worth mentioning of which is Amazon Cloud. Technology's four self-developed chips: Nitro, Graviton, Inferentia, Trainium.

Nitro is the first self-developed chip product of Amazon Cloud Technology. Nitro has three main highlights: first, it implements highly lightweight virtualization; second, it realizes the isolation of data communication and storage at the network level; third, it realizes Encryption at the hardware level. With Nitro, Amazon Cloud Technology can greatly enhance the security of the entire EC2 instance application, each unit can develop independently, and it also ensures the stability of all EC2 instances. Because of the emergence of Nitro, Amazon Cloud Technology has greatly reduced the complexity of launching a new EC2 instance, enabling it to maintain a very fast growth rate, further reducing customer costs, and helping enterprises achieve the goal of reducing costs and increasing efficiency. The performance of the latest generation Nitro V5 chip has been greatly improved compared with the previous chip, including faster forwarding rate, lower latency, and performance per watt increased by 40%.

Graviton, a general-purpose processor chip based on the ARM architecture. Since 2018, Amazon Cloud Technology has successively launched three generations of Graviton server chips. Server CPU chip Graviton3E chip. Throughout the upgrade history of the Graviton series chips, Graviton3 has a 25% increase in computing performance, a 2x increase in floating point performance, and a 2x increase in encryption workload performance; Graviton3E pays special attention to the performance of vector computing, which is 35% higher than the previous generation. This performance improvement It is very important for applications like HPC high-performance computing.

From the perspective of specific cases, the performance of Graviton3E on HPL (linear algebra measurement tool) has increased by 35%, the performance on GROMACS (molecular motion) has increased by 12%, and the performance on financial option pricing workloads has increased by 30%. At the same time, Graviton3E Compared with similar X86 EC2 instances, Graviton3E can also save 60% energy consumption.

Today, the excellent performance of the Graviton series chips has been fully verified. At the 2023 Amazon Cloud Technology China Summit, the case of the World Formula One Championship (hereinafter referred to as "F1") mentioned by Chen Xiaojian fully reflects the computing power of Amazon Cloud Technology. resources and data storage capabilities. F1 uses Graviton3 to run aerodynamic simulations, which can develop a new generation of racing cars at a speed 70% faster than before, and the pressure loss of racing cars can be reduced from 50% to 15%, which makes overtaking easier and brings more racing venues to fans fight. In addition, F1 has collected more than 550 million data points through more than 5,000 single-car and multi-car simulations to help them optimize the next generation of racing cars. According to the F1 team, "Graviton3 makes the system performance 40% faster, you can run the simulation at night, and you can get the results the next morning."

In the machine learning technology exploration track, Amazon cloud technology has developed three different generations of machine learning chips. In terms of training, the acceleration chips Inferentia and Trainium launched by Amazon Cloud Technology cover the scenarios of training and reasoning, and can provide enterprises with the best cost performance. Therefore, many leading generative AI startups, such as AI21 Labs, Hugging Face, Runway, and Stability AI have chosen Inferentia and Trainium as their entire R&D and application platforms.

In machine learning training, the most important indicators are training efficiency and cost performance. Taking the HuggingFace BERT model as an example, the performance of the Trn1 instance based on the accelerator chip Trainium is very good. From the perspective of training throughput, compared with the same type of GPU instance, it can achieve a 1.2 times increase in throughput in the case of a single node. ; In the case of multiple nodes, the throughput is increased by 1.5 times; from the perspective of cost, the cost of a single node is reduced by 1.8 times, and the cost of a cluster is reduced by 2.3 times.

As the model becomes more and more complex, in many cases relying on a single point of computing training cannot meet the needs of users. In many cases, a distributed training is required, such as a very large-scale cluster, and a super-large cluster can be built through Trainium. The cluster can have 30,000 Trainium chips, enabling enterprises to obtain supercomputing-level performance of 6 ExaFlops on the cloud. There are many innovations behind this, such as faster EFA network and PB-level non-blocking network interconnection.

In machine learning reasoning, reasoning often has to consider delay and throughput. Enterprises need higher throughput to bring better cost performance, but often higher throughput will bring higher delay, so developers often have to delay trade-off with throughput. The design of Inferentia2 takes into account the optimization of both throughput and delay. If you take an instance based on Inferentia2 as a test, taking the BERT model common in the field of natural language processing as an example, Inferentia2 can achieve up to 3 times the throughput improvement, 8.1 times the Reduced latency and 4x cost savings allow enterprise developers to get the best of both worlds.

It is also worth mentioning that Inferentia2's performance in large language models is also very prominent. Take an OPT model as a test. The medium-scale OPT model OPT-30B is an example. Compared with the general-purpose EC2 GPU instance, Inferentia2 can achieve a 65% increase in throughput and a 52% reduction in inference costs; 66 billion parameter level OPT-66B, the general-purpose GPU instance has already shown that the memory is insufficient, and the throughput of 351 tokens per second can be achieved on Inferentia2.

Guess you like

Origin blog.csdn.net/m0_72810605/article/details/131472559