[Long text with ten thousand characters] Detailed analysis of AMD Instinct MI300: the glorious moment of AMD

On June 13, US time, AMD held an event called "Data Center and AI Technology Premiere" in San Francisco, California, and introduced data center solutions in the keynote speech.

Among them, the "AMD Instinct MI300 series accelerator" (hereinafter referred to as the Instinct MI300 series) was announced as the GPU/APU of the new data center. Released the GPU "AMD Instinct MI300X" (hereinafter referred to as MI300X) and APU "AMD Instinct MI300A" (hereinafter referred to as MI300A). Each product uses 3D Chiplet technology and uses a unique architecture to configure GPU and APU by loading only GPU or mixing both types of CPU and GPU.

At the same time, we announced the "AMD Instinct Platform" as an AI computing platform equipped with 8 Instinct
300 series, and AI learning platforms such as "DGX H100" provided by NVIDIA equipped with 8 GPUs of the company. Become a competitive product in the computer field


AMD is serious about catching up with NVIDIA in the data center AI GPU market... This is my frank impression after watching the "Data Center and AI Technology Debut" keynote speech.

Until now, AMD has released the Instinct series as GPUs for data centers, but the reality is, I don't think they are seriously competing with NVIDIA in the AI ​​learning market using GPUs.

Of course, the traditional "Instinct MI200" series may be the first GPU that adopts chiplet technology and integrates two GPU chips on one package, showing higher performance than traditional products, I originally thought so


insert image description here
This package also employs OAM (OCP Accelerator Module) specified as an industry standard and can be extended with it. However, devices that can scale up to 8 GPUs using the MI200 series have not yet appeared in a widely available form.

At the end of this keynote speech, AMD Chairman and CEO Lisa Su revealed a platform called "AMD Instinct Platform" that can carry up to 8 Instinct MI300 series. In other words, it's like a "reference design" for OEMs and ODMs. In the future, by releasing specifications to OEMs and ODMs, we will create eight Instinct MI300 series, making it possible to easily build GPU servers equipped with

In the Instinct platform, eight Instinct MI300s are interconnected via AMD's chip-to-chip connection technology, Infinity Fabric (the required number of Infinity Fabrics is supported by the Infinity Fabric implemented on each GPU chip). Unlike NVIDIA's DGX series, it does not require an external NVLink Switch chip, so it can be expanded to 8 GPUs at a relatively low cost (if you want to configure more than 9, you can use InfiniBand, Ethernet, etc. for horizontal expansion.).


The clear goal of these reference designs is to compete with "AI supercomputers" such as the DGX H100 and DGX A100 offered by NVIDIA. NVIDIA offers the H100 GPU (Hopper) and A100 GPU (Ampere) as PCI Express cards, but also offers them as NVIDIA's own standard module called SXM (the open standard version is OAM, adopted by AMD and AMD) Intel ), 8 of which are installed in DGX series such as DGX H100 and DGX A100.

AMD's Instinct platform is a product designed by AMD itself to compete with the DGX series and provided to OEM/ODM manufacturers, and for OEM/ODM manufacturers, customers are persuaded by the price of NVIDIA products, which can serve those who need Do clients present new options. No, it can also be a "weapon" for negotiating with NVIDIA "is NVIDIA's GPU expensive?"


The Instinct MI300 series is the GPU/APU that constitutes such an Instinct platform. The Instinct MI300 series has prepared two products, namely MI300X for GPU and MI300A for APU (CPU+GPU).

These two products are characterized by the use of 3D Chiplet technology. The IOD, the base chip, is layered below the surface-visible CPU and GPU chips, and the CPU and GPU chips that need to be cooled are layered above it.

In addition, HBM3 with a maximum capacity of 192GB is installed around the CPU/GPU chip. In terms of using 3D chiplets, it can be said that its structure is similar to the "Ponte Vecchio" announced by Intel in the Intel Data Center GPU Max released by Intel last year. However, its construction is simpler than that of the old bridge, and it is believed that the MI300X is superior in terms of ease of manufacture.

The Instinct MI300 series is unique in that it can implement both GPU and APU with one 3D chiplet design. The APU MI300A consists of 6 GPU chips and 3 CPU chips. On the other hand, MI300X cancels 3 CPU chips, adds 2 GPU chips, increases computing resources, and has a total of 8 GPU chips.

In addition, the memory of MI300A is 128GB HBM3, while the capacity of MI300X is increased by 192GB HBM3, so it can be used for large-scale AI learning/reasoning that requires larger memory capacity.

In addition, in the MI300A of the APU, the memory is a shared memory, and the CPU and GPU share a memory address. In the traditional architecture, the memory used by the CPU and the memory used by the GPU are different memory address spaces. When the data in the CPU is operated by the GPU, it will be transferred from the CPU memory to the GPU memory. Copying is required, and each memory access is performed, squeezing the bandwidth of the internal interconnect connecting the CPU and GPU, causing performance degradation.

MI300A shares the memory address between the CPU and GPU on the architecture, and the CPU only needs to specify the memory address when the GPU processes data.

However, this time AMD did not explain too much about CDNA 3, which will be the GPU architecture of the Instinct MI300 series. Basically, it is considered an architecture that will be the data center version of RDNA 3, but what kind of expansion does it do to AI learning and reasoning, and whether it can use the AI ​​​​engine implemented in RDNA 3 and so on. Details are completely unknown.

Also, performance wasn't specifically mentioned this time around, other than to call it 2.4 times the density and 1.6 times the memory bandwidth of NVIDIA's H100.

As for the release date, as previously reported, samples of the MI300A have already shipped, and samples of the MI300X are expected to start shipping in the third quarter.

Meta's PyTorch 2.0 announces standard support for AMD GPU optimization in conjunction with latest version of ROCm

This time, we also introduced "ROCm", a software development environment for AI data centers. As for ROCm, we've launched it for Instinct and now have a
version called ROCm 5. ROCm is an AMD version of CUDA that is easy to build.


Such a development environment is very important for building supercomputers by parallel scaling or scale-out GPU, which can flexibly support massively parallel environments. In the words of NVIDIA's Jensen Huang, it can be used as "a giant GPU," so it's possible that you can use CUDA to perform computations with enormous performance even if you don't understand hardware that's hard for AI developers to understand.

Of course, AMD's ROCm is the same, from 1 GPU to 8 Instinct Platform, even if it is expanded to hundreds of GPUs, it can be processed as 1 GPU through ROCm.

In addition, tools are provided to convert CUDA code to ROCm code, and companies and developers who already have CUDA-based AI software assets can replace a CUDA + NVIDIA GPU environment with a ROCm + AMD GPU environment.

AMD President Victor Peng (left) and Soumis Chintala, founder of PyTorch and Vice President of Meta
During the event, AMD President Victor Penn took the stage to explain the collaboration between the company and its software development partners. On stage, Soumis Chintara, founder of PyTorch and vice president of Meta, was invited to explain the partnership between AMD and PyTorch.

PyTorch is a deep learning framework based on an open source project launched internally by Facebook. Many AI developers use PyTorch to build AI learning and AI reasoning applications.

This time, AMD and Meta announced that in the upcoming "ROCm version 5.4.2", they will optimize the Instinct accelerator through PyTorch 2.0's ROCm, even if you don't have any software knowledge, you can use the Instinct accelerator by introducing ROCm as development environment.
AMD President Victor Penn (left) and Hugging Face CEO Clement Delange
Similarly, Hugging Face, which provides AI models, also announced that the AI ​​models provided by Hugging Face will be optimized for AMD's CPU/GPU/FPGA, etc. Customers using Hugging Face AI models will be able to use AMD processors as-is for high-performance learning and inference.

AMD’s appeal this time is because NVIDIA’s strength lies in the fact that many developers are familiar with CUDA as a software development environment and firmly believe that developers are choosing NVIDIA GPUs as the reality of AI computing hardware. (This is the strength of NVIDIA right now). That's why he calls for easy porting from CUDA, and AMD's CPU/GPU/FPGA can be used in an integrated fashion without having to think about anything particularly difficult from frameworks familiar to AI developers (such as PyTorch). is the other side of trying to overcome the situation.

CUDA is about 15 years old, and ROCm is only about 7 years old, and it started to be used seriously in AI much later than CUDA. In that sense it won't be easy for AMD to catch up to this latency, but it's a "chicken and egg" relationship if AMD's GPUs are much faster than NVIDIA's GPUs, for example, many A developer might think it's a "chicken" and start writing a lot of "egg" code, which will lay the next chicken...etc.

It can be said that the Instinct MI300 series that AMD has seen this time has the potential to become the "chicken", and its whereabouts, including the existence of the Instinct Platform that can become the "NVIDIA DGX killer", is a must-see.

[Official Release] AMD Expands Leading Data Center Portfolio with New EPYC CPUs, Shares Details of Next-Gen AMD Instinct Accelerators and Generative AI Software Support

SANTA CLARA, Calif., June 13, 2023 (GLOBE NEWSWIRE) -- Today, at the "Data Center and Artificial Intelligence Technology Premiere," AMD (NASDAQ: AMD ) announced new technologies that will shape the future of computing. Products, strategy and ecosystem partners, highlighting the next phase of data center innovation. AMD took the stage with executives from Amazon Web Services (AWS), Citadel, Hugging Face, Meta, Microsoft Azure and PyTorch to showcase technology partnerships with industry leaders to bring together next-generation high-performance CPU and AI accelerator solutions Bring to market.

“Today, we took another major step forward in our data center strategy as we expanded our data centers by four th Gen EPYC™ processor family for cloud and technical computing,
” said Lisa Su, Ph.D., Chairman and CEO of AMD.
Workloads delivered new leading solutions and announced new public instances and on-premise deployments with the largest cloud providers. "
Artificial intelligence is the defining technology shaping next-generation computing and AMD's largest strategic growth opportunity. We are focused on accelerating large-scale deployments of AMD AI platforms in the data center, with plans to launch the Instinct MI300 accelerator later this year, and a growing ecosystem of
enterprise-grade AI software optimized for our hardware .
"

Computing infrastructure optimized for the modern data center

AMD released a series of updates to its 4th Gen EPYC family designed to provide customers with the workload specialization needed to meet the unique needs of enterprises.

  • Advance the development of the world's best data center CPU. AMD highlighted the 4 th generation of AMD EPYC processors that continue to drive leading performance and power efficiency. AMD's preview of the next-generation Amazon Elastic Compute Cloud (Amazon EC2) M7a instance highlighted with AWS is powered by 4th Generation AMD EPYC processors ("Genoa"). Outside the event, Oracle announced plans to offer new Oracle Compute Infrastructure (OCI) E5 instances powered by 4th Gen AMD EPYC processors.
  • Cloud-native computing without compromise. AMD launched the 4th Gen AMD EPYC 97X4 processor, formerly codenamed "Bergamo". With 128 "Zen 4c" cores per socket, these processors deliver the highest vCPU density1 and industry-leading2 performance for applications running in the cloud with leading energy efficiency. Meta also joins the AMD bandwagon, discussing how these processors are well-suited for mainstream applications like Instagram, WhatsApp, and more; how Meta achieved impressive performance gains with 4th generation AMD EPYC 97x4 processors compared to 3rd generation processors RD's next-generation AMD EPYC across a variety of workloads, while also offering significant TCO improvements, and how AMD and Meta optimized EPYC CPUs to meet Meta's power efficiency and compute density requirements.
  • Better products through technical computing. AMD introduces the 4th generation AMD EPYC processor with AMD 3D V-Cache™ technology, the world's highest performing x86 server CPU for technical computing 3 . Microsoft announced the general availability of Azure HBv4 and HX instances powered by the 4th generation of AMD EPYC processors with AMD 3D V-Cache technology.

Click here to learn about the latest 4th generation AMD EPYC processors and read what AMD customers have to say, click here

AMD AI Platforms – A Vision for AI Everywhere

Today, AMD released a series of announcements to showcase its AI platform strategy, providing customers with a hardware portfolio from cloud to edge to endpoint, and in-depth cooperation with industry software to develop scalable and pervasive AI solutions.

  • Introducing the world's most advanced generative AI accelerator4. AMD announced new details about its family of AMD Instinct™ MI300 Series accelerators, including the launch of the AMD Instinct MI300X accelerator, the world's most advanced generative AI accelerator. Based on the next-generation AMD CDNA™ 3 accelerator architecture and supporting up to 192 GB of HBM3 memory, the MI300X delivers the compute and memory efficiency required for large language model training and inference for generative AI workloads. With the large memory of the AMD Instinct MI300X, customers can now install large language models such as the Falcon-40 (a 40B parameter model) on a single MI300X accelerator 5 . AMD also introduced the AMD Instinct™ platform, which combines eight MI300X accelerators into an industry-standard design to provide the ultimate solution for AI inference and training. Samples of the MI300X will be available to lead customers starting in the third quarter. AMD also announced that the world's first APU accelerator for HPC and AI workloads, the AMD Instinct MI300A, is now sampling to customers.
  • Bring an open, proven and ready AI software platform to market. AMD demonstrated its ROCm™ software ecosystem for data center accelerators, highlighting its readiness and collaboration with industry leaders to integrate an open AI software ecosystem. PyTorch discusses work between AMD and the PyTorch Foundation for a comprehensive upstream ROCm software stack, with immediate "zero-day" support for PyTorch 2.0 and ROCm version 5.4.2 on all AMD Instinct accelerators. This integration gives developers access to a wide range of PyTorch-powered AI models that are compatible with AMD accelerators and can be used "out of the box." Hugging Face, the leading open platform for AI builders, announced that it will optimize thousands of Hugging Face models on AMD platforms, from AMD Instinct accelerators to AMD Ryzen™ and AMD EPYC processors, AMD Radeon™ GPUs, and Versal™ and Alveo™ self- Adapt to the processor.

Powerful Networking Portfolio for Cloud and Enterprise

AMD showcased a robust networking portfolio including AMD Pensando™ DPU, AMD Ultra Low Latency Networking and AMD Adaptive Networking. Additionally, AMD Pensando DPUs combine a powerful software stack with "Zero Trust Security" and a leading programmable packet processor to create the world's smartest, highest performing DPU. AMD Pensando DPUs are deployed at scale in cloud partners such as IBM Cloud, Microsoft Azure, and Oracle Compute Infrastructure. In the enterprise, it is deployed in HPE Aruba Networking CX 10000 Series Switches and works with customers such as leading IT services company DXC, and as part of VMware vSphere® Distributed Services Engine™ to accelerate application performance for customers.

AMD highlighted its next-generation DPU roadmap, code-named "Giglio," which is designed to bring customers increased performance and power efficiency compared to the current generation, and is expected to be available in late 2023.

AMD also announced the AMD Pensando in-silicon software development kit (SSDK), which enables customers to rapidly develop or migrate services for deployment on AMD Pensando P4 programmable DPUs and coordinate with the rich existing capabilities already implemented on AMD. Pensando platform. The AMD Pensando SSDK enables customers to harness the power of the leading AMD Pensando DPU and customize network virtualization and security capabilities within their infrastructure, harmonizing with the rich existing feature set already implemented on the Pensando platform.

About AMD

For more than 50 years, AMD has been driving innovation in high-performance computing, graphics and visualization technologies. Billions of people around the world, leading Fortune 500 companies and cutting-edge scientific institutions rely on AMD technology every day to improve the way they live, work and play. AMD employees are focused on creating leading-edge, high-performance and adaptable products that push the boundaries of what's possible. For more information on how AMD enables today and inspires tomorrow, visit the AMD (NASDAQ: AMD ) website, blog, LinkedIn and Twitter pages.

AMD, the AMD Arrow Logo, EPYC, AMD Instinct, ROCm, Ryzen, Radeon, and combinations thereof are trademarks of Advanced Micro Devices, Inc. Other names are for informational purposes only and may be trademarks of their respective owners.

Cautionary statement

(AMD) forward-looking statements, such as the features, functionality, performance, availability, timing and expected benefits of AMD products, including the AMD 4 th Gen EPYC™ processor family, the AMD Instinct™ MI300 series accelerator family (including AMD Instinct™ MI300X and AMD Instinct™ MI300A), as well as the AMD Pensando DPU code-named "Giglio," both enacted under the Safe Harbor provisions of the Private Securities Litigation Reform Act of 1995. Forward-looking statements are often identified by words such as "will," "may," "expects," "believes," "plans," "intends," "projects" and other terms of similar meaning. Investors are cautioned that the forward-looking statements in this press release are based on current beliefs, assumptions and expectations, speak only as of the date of this press release and involve risks and uncertainties that could cause actual results to differ materially from current expectations sex. Such statements are subject to certain known and unknown risks and uncertainties, many of which are difficult to predict and generally beyond AMD's control, which could cause actual results and other future events and statements, expressed or implied, to or predicted results are materially different. Forward-Looking Information and Statements. Material factors that could cause actual results to differ materially from current expectations include, but are not limited to: Intel Corporation's dominance of the microprocessor market and its aggressive business practices; global economic uncertainty; the cyclical nature of the semiconductor industry; AMD products market conditions in the sales industry; loss of key customers; impact of the COVID-19 pandemic on AMD's business, financial condition and results of operations; competitive markets for AMD product sales; quarterly and seasonal sales patterns; AMD's adequacy of protecting its technology or other knowledge Ability to title; Unfavorable currency exchange rate fluctuations; Third-party manufacturers’ ability to produce AMD products in sufficient quantities in a timely manner and using competitive technology; Availability of basic equipment, materials, substrates or manufacturing processes; Yield capability; AMD's ability to deliver products with expected functionality and performance levels in a timely manner; AMD's ability to generate revenue from its semi-custom SoC products; Potential security breaches; Potential security incidents, including IT disruptions, data loss, data breaches and cyber attacks; potential difficulties in upgrading and operating AMD's new enterprise resource planning system; AMD Uncertainty in ordering and shipping products; AMD relies on third-party intellectual property to design and launch new products in a timely manner; AMD relies on third-party companies to design, manufacture and supply motherboards, software and other computer platform components; AMD relies on Microsoft and other software suppliers AMD's support for the design and development of software that runs on AMD products; AMD's reliance on third-party distributors and plug-in partners; the impact of modification or disruption of AMD's internal business processes and information systems; AMD products' incompatibility with some or all of the industry standard software and hardware compatibility; costs associated with defective products; the efficiency of AMD’s supply chain; AMD’s ability to rely on third-party supply chain logistics functions; AMD’s ability to effectively control the sale of its products in the gray market; the impact of government actions and regulations, such as Export administration regulations, tariffs and trade protection measures; AMD's ability to realize tax-deferred assets; potential tax liabilities; current and future claims and litigation; environmental law, conflict minerals regulations and other laws or regulations; The impact of the joint venture and/or investment on AMD's business and AMD's ability to integrate the acquired business; the effect of any impairment of the combined company's assets on the combined company's financial condition and results of operations; the AMD Notes, Xilinx Note Guarantee and Revolving Credit Facility Agreement constraints imposed; AMD's debt; AMD's ability to generate sufficient cash to meet its working capital requirements, or to generate sufficient revenue and operating cash flow to undertake all planned research and development or strategic investments; political, legal, economic risks and natural disasters; future goodwill impairments and technology license purchases; AMD's ability to attract and retain qualified talent; AMD stock price volatility; and global political conditions. Investors should carefully review the risks and uncertainties contained in AMD's filings with the US Securities and Exchange Commission, including but not limited to AMD's most recent reports on Form 10-K and Form 10-Q. The costs associated with defective products; the efficiency of AMD’s supply chain; AMD’s ability to rely on third-party supply chain logistics functions; AMD’s ability to effectively control the sale of its products in the gray market; the impact of government actions and regulations, such as export administration regulations, tariffs and trade protection measures; AMD’s ability to realize tax-deferred assets; potential tax liabilities; current and future claims and litigation; impact of environmental laws, conflict minerals regulations, and other laws or regulations; the impact of AMD's business and AMD's ability to integrate the acquired business; the effect of any impairment of assets of the combined company on the financial condition and results of operations of the combined company; restrictions imposed by the AMD Notes, the Xilinx Note Guarantee and the Revolving Credit Facility Agreement; AMD debt; AMD’s ability to generate sufficient cash to meet its working capital requirements, or to generate sufficient revenue and operating cash flow to undertake all planned research and development or strategic investments; political, legal, economic risks and natural disasters; future goodwill reduction value and technology licensing purchases; AMD's ability to attract and retain qualified talent; AMD stock price volatility; and global political conditions. Investors should carefully review the risks and uncertainties contained in AMD's filings with the US Securities and Exchange Commission, including but not limited to AMD's most recent reports on Form 10-K and Form 10-Q. The costs associated with defective products; the efficiency of AMD’s supply chain; AMD’s ability to rely on third-party supply chain logistics functions; AMD’s ability to effectively control the sale of its products in the gray market; the impact of government actions and regulations, such as export administration regulations, tariffs and trade protection measures; AMD’s ability to realize tax-deferred assets; potential tax liabilities; current and future claims and litigation; impact of environmental laws, conflict minerals regulations, and other laws or regulations; the impact of AMD's business and AMD's ability to integrate the acquired business; the effect of any impairment of assets of the combined company on the financial condition and results of operations of the combined company; restrictions imposed by the AMD Notes, the Xilinx Note Guarantee and the Revolving Credit Facility Agreement; AMD debt; AMD’s ability to generate sufficient cash to meet its working capital requirements, or to generate sufficient revenue and operating cash flow to undertake all planned research and development or strategic investments; political, legal, economic risks and natural disasters; future goodwill reduction value and technology licensing purchases; AMD's ability to attract and retain qualified talent; AMD stock price volatility; and global political conditions. Investors should carefully review the risks and uncertainties contained in AMD's filings with the US Securities and Exchange Commission, including but not limited to AMD's most recent reports on Form 10-K and Form 10-Q. AMD stock price volatility; and global political conditions. Investors should carefully review the risks and uncertainties contained in AMD's filings with the US Securities and Exchange Commission, including but not limited to AMD's most recent reports on Form 10-K and Form 10-Q. AMD stock price volatility; and global political conditions. Investors should carefully review the risks and uncertainties contained in AMD's filings with the US Securities and Exchange Commission, including but not limited to AMD's most recent reports on Form 10-K and Form 10-Q.

1 EPYC-049: The AMD EPYC 9754 is a 128-core dual-thread CPU that delivers 512 vCPUs per EPYC-enabled server in a 2-socket server with 1 thread per vCPU, which is better than as of 05/23 Any Ampere or 4-socket Intel CPU-based server will need more /2023.
2 SP5-143A: SPECrate®2017_int_base comparison based on executive system scores published on www.spec.org as of June 13, 2013. 2P AMD EPYC 9754 scores 1950 SPECrate®2017_int_base http://www.spec.org/cpu2017/results/res2023q2/cpu2017-20230522-36617.html higher than all other 2P servers. 1P AMD EPYC 9754 score 981 SPECrate®2017_int_base score (981.4 score/socket) http://www.spec.org/cpu2017/results/res2023q2/cpu2017-20230522-36613.html Higher per socket than all other servers. SPEC®, SPEC CPU® and SPECrate® are registered trademarks of Standard Performance Evaluation Corporation. See www.spec.org for more information.
pdf) The application test case simulates the average speedup of a 2P server running a 96-core EPYC 9684X versus a top 2P performance general-purpose 56-core Intel Xeon Platinum 8480+ or ​​a top-of-stack 60-core Xeon 8490H based server for technical computing performance leadership. AMD defines "technical computing" or "technical computing workloads" to include: electronic design automation, computational fluid dynamics, finite element analysis, seismic tomography, weather forecasting, quantum mechanics, climate research, molecular modeling, or similar work load. Results may vary based on factors such as silicon revision, hardware and software configuration, and driver versions. SPEC®, SPECrate® and SPEC CPU® are registered trademarks of Standard Performance Evaluation Corporation. See www.spec.org for more information.
4 MI300-09 - The AMD Instinct™ MI300X accelerator is based on AMD CDNA™ 3 5nm FinFet process technology with 3D chiplet stacking, features high-speed AMD Infinity Fabric technology, has 192 GB HBM3 memory capacity (80GB for Nvidia Hopper H100), 5.218 TFLOPS of Sustained peak memory bandwidth performance above the maximum bandwidth of the Nvidia Hopper H100 GPU.
5 MI300-07K: Internal AMD performance lab measurements as of June 2, 2023 based on current specifications and/or internal engineering calculations. Run or compute Large Language Models (LLM) at FP16 precision to determine the minimum number of GPUs required to run a Falcon (40B parameter) model. Test result configuration: AMD lab system consists of 1 x EPYC 9654 (96 cores) CPU and 1 x AMD Instinct™ MI300X (192GB HBM3, OAM module) 750W accelerator, tested at FP16 precision. Server manufacturers may change configuration products to produce different results.

Guess you like

Origin blog.csdn.net/LingLing1301/article/details/131472952
AMD