CXL 2.0 White Paper Interpretation & Translation: Computing Fast Links, Compared with CXL 1.1 Improvements, What Are the Improvements and Requirements?

Original work:
Dr. Debendra Das Sharma - Intel Fellow and Director of I/O Technology and Standards
and Siamak Tavallaei - Chief Architect for Microsoft Azure Hardware Architecture and Co-Chair of the CXL™ Consortium Technical Working Group


Compute Express Link (CXL) is an open industry-standard interconnect that provides high-bandwidth, low-
cost host processors and devices such as accelerators, memory buffers, and intelligent I/O devices. It is designed to address the growing High-performance computing workloads.Support
heterogeneous processing and storage systems with artificial intelligence applications, machine learning, analytics, cloud infrastructure, cloudification of the network and edge, communication systems and high-performance computing.It enables coherent and PCI-based Memory semantics on top of the I/O semantics of Express® (PCIe®) 5.0 to optimize performance in evolving usage models.
This becomes increasingly important when dealing with this data, and emerging applications require hybrid deployments Scalar, vector, matrix, and spatial architectures in CPUs, GPUs, FPGAs, smart NICs, and other accelerators.

CXL 1.0 debuted in March 2019, supporting dynamic multiplexing between a rich set of protocols
including I/O (CXL.io, based on PCIe), cache (CXL.cache) and memory (CXL.memory) semantics. CXL maintains a unified, coherent memory space between the CPU (main processor) and any memory, attached CXL devices.
This allows the CPU and device to share resources and run on the same platform for higher performance, reduced data movement, and reduced software stack memory area complexity, leading to three main usages as shown in Figure 1.
Furthermore, since the CPU is primarily responsible for coherency management, it can reduce device cost and complexity, as well as the
overhead traditionally associated with coherency across I/O links. CXL 1.1 out July 2019
Includes compliance testing details.

Building on the industry success and acceptance of CXL demonstrated by more than 130 member companies, and with active participation, we are pleased to announce the availability of CXL 2.0, approximately one year after CXL 1.1, enabling additional usage models while maintaining With CXL 1.1 and CXL 1.0.

CXL 2.0 enhances the CXL 1.1 experience by introducing three main areas: CXL Switch, support
for persistent memory, and security.

A new feature of CXL 2.0 is support for single-stage switching for multi-stage fan-out of
the device shown in Figure 2. This will enable many devices in the platform to migrate to CXL while maintaining CXL's backward compatibility and low latency.
Figure 2: CXL 2.0 switches support fanout to multiple devices while maintaining backward compatibility
Figure 2: CXL 2.0 switches support fanout to multiple devices while maintaining backward compatibility

One of the important aspects of the CXL 2.0 feature set is support for pooling of multiple logical devices
(MLD) as well as a single logical device, via a CXL switch (root port) connected to multiple hosts.

This feature enables servers to pool resources, such as accelerators and/or memory, which can be allocated to different servers based on workload.

Say a server needs two FPGAs and a GPU, it can request these resources from a resource manager in the rack, grab them when they are available, and relinquish them when the work is done.

Likewise, memory can be flexibly allocated and released to different servers (aka nodes or hosts) as needed.
This allows system designers not to overprovision each server in the rack while achieving optimal performance. CXL 2.0 allows coupling to Type-3 Multiple Logical Devices (MLDs) through the use, as shown in Figure 3 below. Each color in a host (H) represents a domain or server that defines a hierarchy.

A CXL 2.0 switch can handle multiple domains (up to 16 such hierarchies can reach any one MLD).

A Type-3 MLD device can support up to 16 domains on each of its CXL ports. Type 3 is also possible where the MLD device partitions its CXL resources and connects directly to multiple hosts, each with a dedicated CXL link, as shown in Figure 3.

This helps achieve the same performance as a direct connection because switch latency is removed, which is important for memory access.

CXL 2.0 achieves these goals by defining protocol enhancements that enable pooling while
maintaining the quality of service and reliability isolation requirements of different domains.

It also defines the managed hotplug process to add/remove resources.

Most importantly, CXL 2.0 defines a standardized fabric manager that ensures users have the same experience, independent of different types of pools, devices, hosts, switches or their usage models.


Figure 3: CXL device pool with and without CXL 2.0 switches

A key innovation in the industry revolves around non-volatile memory, which is approaching the latency and bandwidth characteristics of DRAM while offering the advantages of high capacity and endurance. This enables a lot of high performance memory.

One of the challenges of loading applications where the entire dataset can be stored in a storage interconnect such as CXL is guaranteeing (committing the data store to persistent memory) persistence.

CXL 2.0 addresses this challenge through a software-based architectural process and a standard memory management interface, enabling persistent memory to move from a controller-based approach to direct memory management (Figure 4).

Figure 4: CXL 2.0 addresses the issue where safe persistence is key persistent memory

Given how prevalent exploits are, this is the cornerstone of any technology's success. CXL has made great strides in this regard, working with other industry, standards bodies such as PCI-SIG and DMTF to ensure we have a seamless user experience while delivering the best possible security mechanisms. CXL 2.0 enables encryption on valid links to seamlessly integrate with existing security mechanisms such as device TLBs, as shown in Figure 5.

Figure H5: Security enhancements for CXL 2.0 Two versions of the specification were published to go beyond CXL 2.0.

A year and a half in, based on feedback from the Computing, CXL C Consortium, the industry and the terminal community, we are working hard to implement the next revision of CXL 3.0, to include more uses and deliver higher performance. Please join us on this exciting journey of user scenarios and make us better


Interpretation:

CXL 2.0 is a new technology that allows different devices such as CPUs, GPUs, memory and accelerators to communicate faster and more efficiently. It uses a common language called CXL that supports different types of data and operations. It also allows devices to share memory and resources so they can work better together. CXL 2.0 is compatible with previous versions of CXL and PCIe that are widely used in today's computers. CXL 2.0 also introduces new features such as swapping, persistent memory, and security to support more flexible and secure designs. CXL 2.0 is designed for high-performance applications such as artificial intelligence, machine learning, cloud computing and edge computing.

CXL 2.0 is fully compatible with CXL 1.1 and 1.0, and uses the same PCIe 5.0 physical layer for communication, so it also occupies 16 PCIe 5.0 lanes

Guess you like

Origin blog.csdn.net/LingLing1301/article/details/129773299