Black Technology in Data Center - What is NPO/CPO?

Hello everyone, I am Xiaozao Jun.

In today's article, let's talk about the two latest black technologies in the data center - NPO/CPO.

59ff3a3f280c6da465c87f2198a0ea93.png

The story still has to start from the beginning.

Last year, the state released the strategy of "counting from the east to counting from the west" , which attracted the attention of the whole society.

The so-called "counting in the east and calculating in the west" is actually the adjustment of the division of labor in the data center. We have transferred part of the computing power demand in the eastern coastal area to the data center in the western area.

The reason for this is that the western region has abundant energy resources and low natural temperature, which can greatly reduce electricity bills and carbon emissions.

We all know that the data center is the carrier of computing power. At this stage, we are engaged in digital transformation and digital economy, and we cannot do without computing power and data centers. However, the power consumption problem of the data center cannot be ignored.

According to the data, the total electricity consumption of national data centers in 2021 will be 216.6 billion kwh, accounting for 2.6% of the country's total electricity consumption, which is equivalent to the annual power generation of two Three Gorges hydropower stations and the total electricity consumption of 1.8 Beijing areas.

Such terrifying power consumption has put a lot of pressure on us to achieve the "double carbon" goal.

Ever since, the industry began to step up research on how to reduce the energy consumption of data centers.

83f9547e65c2ec65ab7aeec99597c784.png

data center (IDC)

Everyone should know that there is an important parameter index in the data center, that is PUE (Power Usage Effectiveness, power usage efficiency) .

PUE = total energy consumption of the data center / energy consumption of IT equipment. Among them, the total energy consumption of the data center includes the energy consumption of IT equipment, as well as the energy consumption of other systems such as cooling and power distribution.

We can see that in addition to the power used on the main device, there is a large part of energy consumption for heat dissipation and lighting.

Therefore, when working on energy saving and emission reduction in data centers, there are two ideas:

1. Reduce the power consumption of the main device

2. Reduce power consumption in heat dissipation and lighting (mainly heat dissipation)

█The power consumption challenge of the master device

Speaking of the main device, everyone immediately thought of the server. That's right, the server is the most important device in the data center. It carries various business services, including hardware such as CPU and memory, and can output computing power.

But in fact, the main equipment also includes a class of important equipment, that is, network equipment , that is, switches, routers, firewalls, etc.

At present, the accelerated implementation of AI/ML (artificial intelligence/machine learning), coupled with the rapid development of the Internet of Things, has increased the business pressure of data centers.

This pressure is not only reflected in computing power requirements, but also in network traffic. The network access bandwidth standard of the data center has been raised from 10G and 40G in the past to 100G, 200G and even 400G now.

In order to meet the demands of traffic growth, network devices themselves need to be upgraded iteratively. Ever since, more powerful switching chips and higher-speed optical modules have all begun to be used.

Let's look at the switch chip first .

The switching chip is the heart of the network equipment, and its processing capability directly determines the capability of the equipment. In recent years, the power consumption of switching chips has increased, as shown in the figure below:

6417dacdd05a91fe3eb5af3c3836df61.png

Change trend of switch chip power consumption

It is worth mentioning that although the overall power consumption of network equipment continues to increase, the power consumption of a single bit (bit) continues to decrease. In other words, energy efficiency is getting higher and higher.

Look at the optical module again .

Optical modules play an important role in the field of optical communications and directly determine the bandwidth of network communications.

As early as 2007, the power of a 10Gbps (10Gbps) optical module was only about 1W.

From 40G and 100G to the current 400G, 800G and even 1.6T optical modules in the future, the power consumption increase speed is like a rocket, soaring all the way, approaching 30W. Everyone should know that a switch can have more than one optical module. If it is fully loaded, there are often dozens of optical modules (if there are 48, it is 48×30=1440 W).

Generally speaking, the power consumption of the optical module accounts for more than 40% of the power consumption of the whole machine. This means that the power consumption of the whole machine is likely to exceed 3000 W.

A data center has more than one switch. The power consumption behind this is terrible to think about.

In addition to switching chips and optical modules, network equipment also has a "big power consumer" that you may not be familiar with, that is- SerDes .

SerDes is the abbreviation of English SERializer (serializer)/DESerializer (deserializer). In network equipment, it is an important device, mainly responsible for connecting optical modules and network switching chips.

1ceaf1dbf975b07682a07bca6ca188b8.png

To put it simply, it is to convert the parallel data from the switching chip into serial data for transmission. Then, at the receiving end, the serial data is converted into parallel data.

As mentioned earlier, the capabilities of network switching chips are constantly improving. Therefore, the rate of SerDes must also be increased accordingly in order to meet the requirements of data transmission.

The increase in the rate of SerDes naturally leads to an increase in power consumption.

In the 102.4Tbps era, the SerDes rate needs to reach 224G, and the chip SerDes (ASIC SerDes) power consumption is expected to reach 300W.

It should be noted that the rate and transmission distance of SerDes will be affected by the PCB material process and cannot be increased indefinitely. In other words, when the SerDes rate increases and the power consumption increases, the PCB copper foil is not capable enough to allow the signal to propagate farther. Only by shortening the transmission distance can the transmission effect be guaranteed.

It's a bit like a shot put game, when the heavier the shot (the higher the SerDes rate), the shorter the distance you can throw.

1beb5aeaa4ab172cf241746c4322eec7.png

Specifically, when the SerDes rate reaches 224G, it can only support a transmission distance of 5 to 6 inches at most.

This means that, on the premise that there is no technological breakthrough in SerDes, the distance between the network switching chip and the optical module must be shortened.

To sum up, switching chips, optical modules, and SerDes are the three "power consumption" mountains of network equipment.

According to data from equipment manufacturers, in the past 12 years, the network switching bandwidth of the data center has increased by 80 times. Consumption increased by 25 times.

626ebecec0361197e32a8490969b5764.jpeg

Source of information: 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

In this case, the proportion of power consumption of network equipment in the data center continues to rise.

4069a1dbb1fb2b02335e202c5313d480.jpeg

Energy consumption of network equipment (red)

Data source: Facebook-OIF CPO Webinar 2020

█The power consumption challenge of heat dissipation

Earlier, Mr. Xiaozao introduced the power consumption challenges of network equipment in detail. Next, let's look at heat dissipation.

In fact, compared to the increase in power consumption of network equipment, the power consumption of heat dissipation is the real big head .

According to statistics, the proportion of switching equipment in the total energy consumption of a typical data center is only about 4%, which is less than 1/10 of that of servers.

But what about cooling? According to CCID statistics, about 43% of the energy consumption of China's data centers in 2019 is used for cooling IT equipment, which is basically the same as the 45% energy consumption of IT equipment itself.

Even now that the country has put forward strict requirements on PUE, according to the third-level energy efficiency (PUE=1.5, the limit value of the data center), heat dissipation accounts for nearly 40%.

f68e745aa4f561d33597d7043cf8ec1a.png

Traditional heat dissipation methods (air cooling/air conditioning cooling) can no longer meet the business development needs of current high-density data centers. So, we introduced liquid cooling technology .

Liquid cooling is a new technology that uses liquid as a refrigerant to dissipate heat from heat-generating components. The introduction of liquid cooling can reduce the heat dissipation energy consumption of the data center by nearly 90%. The overall energy consumption of the data center can be reduced by nearly 36%.

975516f9952a253a656007f7a27623cd.png

This energy-saving effect can be said to be very powerful, directly saving one-third of electricity.

In addition to stronger heat dissipation and energy saving, liquid cooling also has significant advantages in terms of noise, site selection (not affected by the environment and climate), and construction cost (allowing high-density layout of cabinets and reducing the floor space of the computer room).

Therefore, almost all data centers are now using liquid cooling. Some liquid-cooled data centers can even reduce the PUE to about 1.1, which is close to the limit of 1.

Liquid cooling, does it mean that the entire device is completely submerged in the liquid?

uncertain.

Liquid cooling schemes generally include two types, namely immersion type and cold plate type .

The immersion type, also called the direct type, is to immerse all the components with high heat generation in the main equipment in the cooling liquid for heat dissipation.

The cold plate type, also known as the indirect type, is to attach the main heat dissipation component to a metal plate, and then the refrigerant liquid flows in the metal plate to take away the heat. Nowadays, many DIY computers are assembled with cold plates.

The server adopts liquid cooling, which is already a very mature technology. Well, since liquid cooling is going to be used, of course it would be better if the server and network equipment are installed together, otherwise two systems will be required.

Here comes the question, can our network equipment be liquid-cooled?

NPO/CPO, debut

Dang Dang Dang Dang Dang! After so much preparation, our protagonist is finally about to make his debut.

In order to reduce the power consumption and heat dissipation of network equipment as much as possible, under the guidance of OIF (Optical Internet Forum), many manufacturers in the industry jointly launched the NPO/ CPO technology .

In November 2021, domestic equipment manufacturer Ruijie Networks (Ruijie Networks) released the world's first 25.6T NPO cold plate liquid-cooled switch. In March 2022, they released a 51.2T NPO cold plate liquid-cooled switch (concept machine).

e6734d7a49523a90f0a146125e18bb38.png

NPO cold plate liquid cooling switch

NPO, English full name Near packaged optics, near packaged optics. CPO, English full name Co-packaged optics, co-packaged optics.

To put it simply, NPO/CPO is a technology that "packages" the network switching chip and the optical engine (optical module).

Our traditional connection method is called Pluggable (pluggable). The optical engine is a pluggable optical module. After the optical fiber comes, it is inserted into the optical module, and then sent to the network switching chip (AISC) through the SerDes channel.

ceee7cddd8aba364b03580e08624a6e4.png

CPO is to assemble the switch chip and the optical engine together on the same Socketed (slot), forming a co-package of the chip and the module.

NPO separates the light engine from the switch chip and assembles them on the same PCB substrate.

You should be able to see that CPO is the ultimate form, and NPO is the transitional stage. NPO is easier to implement and more open.

The purpose of integration ("packaging") is very clear, which is to shorten the distance between the switching chip and the optical engine (control within 5~7cm), so that high-speed electrical signals can be transmitted between the two with high quality, satisfying The bit error rate (BER) requirements of the system.

f43fd6457be3212dde5a697fd12eb16b.png

Shorten the distance and ensure high-quality transmission of high-speed signals

After integration, a higher density of high-speed ports can also be realized, increasing the bandwidth density of the whole machine.

In addition, integration makes components more concentrated, and is also conducive to the introduction of cold plate liquid cooling.

e7d740d8bd86e79e8c21c9295343f82a.png

Inside the NPO switch (after removing the cold plate)

It can be seen that the distance between the switching chip and the light engine is greatly shortened

Behind the NPO/CPO technology is actually the very popular silicon photonics technology .

Silicon photonics is a silicon-based optoelectronic large-scale integration technology that uses photons and electrons as information carriers. Simply put, it is to integrate a variety of optical devices on a silicon substrate to become an integrated "optical" circuit. It is a miniature optical system.

The fundamental reason why silicon photonics is so popular is that microelectronics technology is gradually approaching the performance limit. Traditional "electric chips" are becoming more and more inadequate in terms of bandwidth, power consumption, and time delay. Therefore, the "(silicon) Optical chip" this new track.

Progress of NPO/CPO switch

NPO/CPO technology is currently a hot research direction for major manufacturers. Especially NPO, because it has the best open ecology, the industrial chain is more mature, and can obtain the fastest benefits in cost and power consumption, so the development will be implemented faster.

The 25.6T Silicon Photonics NPO cold-plate liquid-cooled switch of Ruijie Networks was mentioned earlier.

This NPO switch is based on a 112G SerDes 25.6T switch chip, 1RU in height, and the front panel supports 400G optical interfaces with 64 connectors. It consists of 16 1.6T (4×400G DR4) NPO modules and supports 8 ELS /RLS (external laser source module).

In terms of heat dissipation, a cold plate cooling method with a non-conductive coolant is used.

The 51.2T Silicon Photonics NPO cold plate liquid-cooled switch has the same height, and the NPO module has been upgraded from 1.6T to 3.2T. The front panel supports 64 800G connectors, and each connector can be divided into two 400G ports. Achieve forward compatibility. The number of external light source modules has increased to 16.

c4a481f1d59398e0bf781a97c7e7df10.png

51.2T NPO cold plate liquid cooling switch

In actual networking, 51.2T NPO switches (commercially released at the end of 2023 at the earliest) can be applied to 100G/200G access networks as access & aggregation devices to achieve high-speed interconnection.

It is worth mentioning that the technology and product development of NPO/CPO is not a simple matter, behind it is the test of a company's overall R&D strength.

Ruijie Networks' NPO/CPO product launch this time is the result of their continuous investment in research and development and innovation, and also reflects their technological leadership in this field.

Ruijie Networks began to focus on silicon photonics technology in 2019, and formally established a R&D and product team in June 2020. As members of OIF/COBO, they have always participated in the global meeting of the working group and participated in the discussion and formulation of relevant standards.

47212f14c7fb8f3de6d2a3f401d4a4da.png

OIF working group global meeting site

In the direction of silicon photonics, Ruijie Networks has been at the forefront of the world, and the future can be expected.

Epilogue

Well, after so much introduction, I believe everyone has already understood what NPO/CPO is.

These two technologies are the undoubted development direction of data center network equipment. Under the current wave of digitalization, our pursuit of computing power and network communication capabilities is endless. While pursuing performance, we must also strive to balance power consumption. After all, what we are going to take is the path of sustainable development.

It is hoped that silicon photonics technology represented by NPO/CPO can further accelerate its implementation and contribute to the green and low-carbon information infrastructure.

In the future, what kind of technological innovation will silicon photonics technology bring? Let us wait and see!

——End of the full text——

ec57e06d127db9d0dabe76c70199b663.png

Guess you like

Origin blog.csdn.net/qq_38987057/article/details/127437798