ByteDance infrastructure orchestration and scheduling team's paper selected for SoCC 2023, the top conference in the field of cloud computing

From October 30 to November 1, 2023, SoCC 2023 will be held in Santa Cruz, California, USA. The research results of the Bytedance Infrastructure-Organization and Scheduling Team were accepted by S o CC 2023 and were invited to give on-site reports.

The full name of the SoCC conference is Annual ACM Symposium on Cloud Computing. It is one of the top conferences in the field of cloud computing . It is also the only top conference among all ACM conferences that is sponsored by both SIGMOD and SIGOPS. It represents the current cutting-edge level of cloud computing in academia, industry and open source communities. The SoCC conference was established with the rise of cloud computing and has been held for the 14th time. The conference attracts submissions from the world's top research institutions and well-known large companies every year, and has high requirements for system innovation, completeness, and effectiveness. This year, the acceptance rate of conference papers is only 30%.

Gödel: Unified Large-Scale Resource Management and Scheduling at ByteDance

Gödel is an offline unified scheduling system independently developed by ByteDance's infrastructure-orchestration and scheduling team for large-scale cloud-native infrastructure management.

ByteDance's business lines have experienced increasing demand for computing resources during the past few years. With the continuous expansion of data centers and differentiated demands for computing resources, the native Kubernetes scheduler unifies various offline business loads. Hosting and unified resource operation bring a series of challenges.

In this context, the Gödel scheduling system was born. Compared with the Kubernetes native scheduler, it can support various offline and mixed scheduling of machine learning workloads in a cluster environment at the same time, and has high throughput (up to 10X) and high performance. Features such as elasticity (sub-minute resource transfer) and high resource utilization (up to 60%) better meet the deployment requirements of Byte's various businesses such as hybrid deployment and resource pooling. While meeting the SLA requirements of various types of business loads, it provides a common platform for the unified operation of computing cluster resources, thereby improving the resource utilization and task flexibility of the Byte Data Center, achieving the purpose of reducing costs and increasing efficiency.

Gödel's papers and field reports will be officially unveiled at SoCC 2023 at the end of October. By then, the ByteDance infrastructure team will also publish a corresponding interpretation article of the paper, so please stay tuned.

Guess you like

Origin blog.csdn.net/weixin_46399686/article/details/132849550
Recommended