PUE optimization practice of OPPO's self-built data center under low load rate

01

   background   

Affected by various factors in 2023, the overall demand for servers will decrease. According to a report released by the research organization TrendForce on May 17, as the four major CSPs have successively reduced their purchases, OEMs such as Dell and HPE will also reduce their purchases from February to April. During the period, the annual shipment forecast was lowered, with a year-on-year decrease of 15% and 12% respectively. Coupled with the impact of the international situation and economic factors, the outlook for server market demand in 2023 is not good. This year's global server shipments will be further revised down. to 13.835 million units, a year-on-year decrease of 2.85%.
However, the construction of data centers is affected by the delivery cycle, and usually has a certain degree of advance, which leads to the supply of data center racks higher than the demand for servers. The current mainstream cooling system is a centralized chiller + terminal precision air conditioner. In this context, the data center has to operate in a low-load mode, which leads to poor energy efficiency performance of the electromechanical system as a whole.
In the past, the supply of short of demand, and the racking speed was fast, and the load rate could be pulled up quickly. Therefore, the industry generally does not optimize the working conditions of the load rate below 2 5 %, which is currently affected by the supply of servers , the continuous operation time under low-load conditions will be longer, and in the new situation where companies focus on refined operations, it is necessary to optimize this part of the operation.

02

   What is Data Center Energy Efficiency   

Data Center (DC for short) is a construction site that provides an operating environment for centralized electronic information equipment, including a host room, auxiliary area, support area, and administrative management area. It is an important base for computing power infrastructure. Among them, the electromechanical system of the data center consumes the most energy, which mainly provides stable and high-quality power supply and cooling for IT equipment.

The energy efficiency of the data center is currently measured by the core indicator PUE (Power Usage Effectiveness), which is calculated as the ratio of the total power consumption of the data center to the power consumption of IT equipment. The larger the proportion of electric energy used in IT equipment, the more energy-efficient the data center proves. The figure below shows the global PUE trend shown in the global data center research report of the authoritative organization Uptime Institute in 2021. By 2022, the general average PUE is around 1.55.


03

   PUE optimization practice of OPPO's self-built and self-operated data center   

Building A of OPPO Marina Bay Data Center is composed of two modules in the north and south, which are respectively connected to two 10MW power consumption capacity. According to estimates, before the optimization, by the end of 2022, the daily average PUE is about 1.9~2.0, and the overall data center storage rate is lower than 10%. The temperature of the day is 21-29℃, and the relative humidity is 60%-95%. According to calculations, the PUE is expected to be higher.

After optimization, as of August 2023, the weekly average PUE of the south module is optimized to about 1.4, the weekly average PUE of the north module is optimized to below 1.4, and the IT load rate is lower than 20%. The outdoor temperature is relatively high during this period, which is a hot time From the perspective of the whole year, it is expected that the average annual PUE will be lower.

The project team formed a systematic system by summarizing the actual optimization experience and outputting documents such as "Energy Efficiency Management Standards" and "OPPO Data Center Technical Optimization Standardization System". During the optimization process, the front-line operation and maintenance team participated deeply, which improved the understanding of the electromechanical system operation of the front-line colleagues, and made them understand the principles of energy saving and consumption reduction. This is conducive to the real implementation of the system and the realization of daily cost reduction and high availability.
3.1 Report making and baseline establishment

To optimize a huge system, it is necessary to measure it in various ways to observe the operating status of the system, and at the same time it is beneficial to assist the verification results. Therefore, the first step in energy efficiency optimization is the production of energy efficiency reports.

After the report is prepared, the technicians identify the overall status and confirm the baseline; and then comprehensively consider the data center design indicators and the pace of future shelves, and formulate an objective and reasonable cost reduction target for this year. Through a detailed analysis of the main power-consuming equipment in the data center, list the equipment points that need to be monitored.

According to the actual situation, monthly reports (with the minimum granularity of day), weekly reports (with four-hour granularity), daily reports (with hourly granularity), and monthly reports are used to summarize the monthly operation and identify whether there is any abnormality The weekly report is used to locate and verify the abnormal period of abnormal equipment, and the daily report is used to deeply analyze the cause of the abnormality and solve the problem based on the on-site situation.
3.2 Key technology: HVAC

The overall idea of ​​HVAC professional cost reduction: from simple to deep, from single point to overall situation. Firstly, check and optimize a single independent device, and carry out simple single-point cost reduction. This can be achieved by evaluating and shutting down unnecessary devices, and adjusting the logic threshold of a single device under the premise of ensuring safety. The actual implementation measures of OPPO's own data center are as follows:

a) Control and adjust the differential pressure of the circulating pump on the primary side of the liquid-cooled cluster, reduce the pump frequency from 50hz to 37hz, reduce the power consumption by XX kW, and save XX yuan in electricity costs every year.
b) Implement the air conditioner optimization plan, turn off XX air conditioners between columns and XX room-level air conditioners , reduce the power of terminal air conditioners by XX kW, and save XX yuan in electricity bills
c) Execute the optimization of the operation strategy of the constant humidity machine, turn off XX units of the constant humidity machine, reduce the terminal power XXkW, and save XX yuan in electricity costs
d) Enclose the hot and cold aisles, purchase new blind plates, increase the temperature difference between supply and return air, supply and return water, and reduce the transmission and distribution power on the wind side and water side
e) Wind side - fan filter cleaning, water side - Y-type filter cleaning , reducing pressure loss during transmission and distribution
Comprehensive optimization of the water system, comprehensively considering the logical linkage relationship of multiple equipment on the water side, to ensure the optimal energy efficiency at the system level:

f) Under pre-cooling conditions, there is a short-circuit problem in the plate replacement. By adjusting the measuring point of the logic control and changing the switching threshold in the pre-cooling mode, this problem can be fixed. This measure is expected to save no less than XX million electricity per year.

g) The cooling water pump is automatically increased to full frequency under non-host cooling conditions. After changing the logic, the power of a single water pump is reduced from XXkW to an average of about XXkW. The two modules in the north and south save XXkW of electric power.

h) The operation strategy of using cold storage tanks for cold storage in the early stage of operation brings the following three benefits: first, it avoids the surge when the chiller is under low load; second, it avoids the long-term low-load and inefficient operation of the entire HVAC system; 3. According to the time-of-use electricity price, the unit price of electricity can be reduced to realize peak shaving and valley filling. During this period, this measure is expected to save XX million

Wind-liquid joint commissioning: the overall energy efficiency optimization of the HVAC system considering the water system and the wind system

i) In the case of low load, the cooling capacity of the water side is excessive, so the water valve of the terminal air conditioner is increased to reduce the power consumption of the fan on the wind side when the load on the water side and IT does not increase

– The air conditioner in the private room of the north module saves XXkW of power consumption, and it is estimated to save XX yuan in electricity costs every year.

– Set up the air conditioners in the access room of the north-south module operator, the weak motor room and the test machine room. There is no obvious change in temperature and humidity, and the electric power is saved by XXkW.

– Adjust the parameters of the precision air conditioner on the first floor and the second floor of the north-south module power distribution room, save XXKw of electric power, save XX million in the whole year, and save XX million in the rest of the year

–Optimize the operation strategy of the AB road power distribution room on the 3rd/4th floor of the north-south module , change from cold standby high speed to hot standby low speed, no obvious change in temperature and humidity, save electric power XX kW, and save XX yuan in electricity costs throughout the year. Save XX million yuan in electricity bills in the remaining time

3.3 Key technology: electrical

General idea of ​​cost reduction for electric power majors:

The energy efficiency optimization of the electrical system is mainly divided into two aspects, namely reducing losses and reducing fluctuations. Reducing losses can reduce the amount of electricity used, which can be realized by adjusting the UPS operation mode or turning off some non-essential equipment, and reducing fluctuations can reduce the data center monthly The basic electricity charge can be achieved through load balancing.

a) Turn off the heating of the diesel generator, reduce the power consumption of the auxiliary system by XXkW, and save XX yuan in electricity costs every year

b) Complete the load balancing measures of AB road, save the basic electricity cost XX yuan per month, and it is estimated to save XX yuan in electricity costs every year

c) Optimization of UPS operation strategy, importing intelligent parallel program to UPS, reducing the number of UPS parallel machines, improving UPS load rate and efficiency, reducing loss of XXkW, and saving XX yuan in electricity costs throughout the year

d) Balance the power UPS load by switching ATS, increase the power UPS load rate, and reduce UPS loss

e) The eco mode of precision air-conditioning power UPS on the third and fourth floors reduces UPS loss by XXkW, and is expected to save XX yuan in electricity costs throughout the year and XX yuan in the rest of the year

f) All SVG active harmonic cabinets on the 34th floor are closed, the north module reduces the electric power by XX kW, and the annual cost reduction is XX yuan, and this year’s cost reduction is XX yuan; the south module reduces the electric power by XX kW, and the annual dimension is reduced The cost is XX million yuan, and the cost is reduced by XX million yuan this year

3.4 Key technologies: others

a) The lighting management strategy is implemented, the lighting power is reduced from XXkW to XXkW, and the annual electricity cost is saved by XX yuan

b) The rainwater recycling system was officially launched to reduce municipal water consumption

c)A栋大门封堵,机房内部桥架封堵,降低数据中心湿度和恒湿机用电量


04

   总结   

近些年经济形势紧张,各家公司对成本都异常关注,可以预见的未来将会结束跑马圈地的粗放式发展,进入精细化运营的时代,云类资源发展也不例外。 数据中心作为云的底座,能源成本占运营成本的大头。此次项目仅作为初步的尝试,后面会持续地对数据中心运行进行优化,以求达致最优的能效,这在降本增效大背景下非常有意义。

此类降本优化的项目从来不是一蹴而就的,需要相关工作人员在日常工作中发现可优化的点,每次发现一个点,优化一个点,日积月累才能实现质的突破,此类知识能力的积累也会变成公司难以被复制的技术竞争力,可以在未来的项目中加以复用和延伸。

对于新建数据中心而言,在选择技术架构时,可考虑模块化分布式的架构,IT上架时可按需开启,在一定程度上可以避免低IT负载情况下大马拉小车的情况。

但从另一个角度来看,系统的可用性与经济性从来都是呈现负相关的,在能效优化的过程中需要平衡可用性与经济性的关系,不能因为节省一些电费导致业务中断,因小失大。

以上为oppo数据中心在低负载下的PUE优化的首次实践,欢迎行业内专家共同交流,为整个行业的节能降耗提供宝贵技术经验。
作者介绍
ALAN KONG 孔庆一 
OPPO IDC 工程师 LEED AP BD+C

主要从事IDC基础设施技术运营工作

END
About AndesBrain

安第斯智能云
OPPO 安第斯智能云(AndesBrain)是服务个人、家庭与开发者的泛终端智能云,致力于“让终端更智能”。安第斯智能云提供端云协同的数据存储与智能计算服务,是万物互融的“数智大脑”。


本文分享自微信公众号 - 安第斯智能云(OPPO_tech)。
如有侵权,请联系 [email protected] 删除。
本文参与“OSC源创计划”,欢迎正在阅读的你也加入,一起分享。

微软官宣:Visual Studio for Mac 退役 中国开发者团队创建的编程语言:MoonBit(月兔) C++ 之父 Bjarne Stroustrup 分享人生建议 Linus 也反感乱七八糟的缩写,什么 TM 的叫 "GenPD" Rust 1.72.0 发布,未来支持版本最低为 Windows 10 文心一言面向全社会开放 WordPress 推出 “百年计划” 微软不讲武德,用“恶意弹窗”提示用户弃用 Google 高级、函数式、解释型、动态编程语言:Crumb 青语言 V1.0 正式发布
{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/4273516/blog/10104647