Bytedance Cloud Native Cost Optimization Practice Open Source Project Katalyst | Community Programming Challenge Launched!

Introduction to Katalyst

GitHub Repo:https://github.com/kubewharf/katalyst-core

Katalyst is ByteDance's open source cost optimization practice system. It is committed to solving the problem of unreasonable resource utilization in cloud native scenarios and providing solutions for resource management and cost optimization.

Katalyst was officially open sourced in March this year. Since the release of version 0.1.0, it has completed the output of several core capabilities through version 0.2.0 iteration. On August 8, Katalyst released version v0.3.0 . The core functions include enhanced KCNR API capabilities, enhanced framework scalability, enhanced co-location capabilities, etc. For details, see Katalyst GitHub .

Activity background

Katalyst, as an open source project that ByteDance’s cloud native team continues to invest in, values ​​the long-term value of open source and the feedback and participation of the open source community. At the same time, it also encourages college students to participate in real open source projects in the early stages and experience how the open source community operates. , improve personal abilities. In the GLCC programming summer camp that I participated in before , the relevant topics published by Katalyst attracted many college students to sign up to participate. During the project, the instructors and project students actively communicated about feasible plans and helped students participate in project development.

Due to the topic restrictions of previous activities, only one student finally participated in the project. In order to encourage more college students who are interested in cloud native to participate in the community, and also hope that the community will have more external voices and fresh forces, we plan to reuse this development model of collaboration with college students in the project version iteration process. Provide college students with paths and guidance for participating in the open source community, while helping the community collect more feedback and needs. Based on this, we will launch the Katalyst open source community [ Programming Challenge ] event in the community in September. Based on the relevant capability planning in the future new version of the community, issue tasks will be released, and college students will be invited to participate in the design and development of some issues of the project, and for the completion of Provide certain rewards to students who complete the task.

Topic introduction

Topic 1: Support for OOM priority as a QoS enhancement Support OOM priority as a QoS enhancement

GitHub issue: https://github.com/kubewharf/katalyst-core/issues/216

Please add the following capabilities to Katalyst:

  • Users can specify the OOM priority as a QoS enhancement.

  • Implement OOM priority with oom_score_adj.

  • Users can specify OOM priority as QoS enhancement

  • oom_score_adjImplement OOM priority using

Topic description:

Currently, Kubernetes will configure different oom_score_adj values for different QoS classes. However, the order of OOM also depends on other dimensional factors such as the memory usage of the container.

In the colocation scenario, it's important to strictly ensure that web services are terminated later than batch jobs due to OOM when the cluster's memory resources become scarce.

oom_score_adjCurrently, Kubernetes configures different values ​​for different QoS classes . However, the order of OOM also depends on other dimensions, such as the memory usage of the container.

In a co-location scenario, when cluster memory resources become scarce, it must be strictly ensured that the batch processing job is terminated due to OOM earlier than the web service.

Topic 2: Support NUMA-granularity reporting for reclaimed resources Support NUMA granularity reporting of reclaimed resources

GitHub issue: https://github.com/kubewharf/katalyst-core/issues/217

Please add the following capabilities to Katalyst:

Enhance the resource reporting mechanism to support reporting of reclaimed resources at the granularity of NUMA nodes.

Enhanced resource reporting mechanism to support NUMA node granular reporting of recycled resources.

Topic description:

Currently, the reporting of reclaimed resources is performed at a node granularity level. However, in environments with NUMA architectures, this approach might lead to suboptimal scheduling result and potential pod evictions due to NUMA-level interference.

Currently, the reporting of recycled resources is done at the node granularity level. However, in environments with NUMA architectures, this approach can lead to suboptimal scheduling results and potential Pod evictions due to NUMA-level interference.

Topic 3: Support inter-pod affinity and anti-affinity at NUMA level Support inter- pod affinity and anti-affinity at NUMA level

GitHub issue: https://github.com/kubewharf/katalyst-core/issues/220

Please add the following capabilities to Katalyst:

Support inter-pod affinity and anti-affinity at NUMA level in Kubernetes.

Supports NUMA-level inter-pod affinity and anti-affinity in Kubernetes.

Topic description:

Currently, Kubernetes supports inter-pod affinity and anti-affinity at the node level. However, there is a growing need for extending this support to the NUMA level.

For example, in a tensorflow training job, high-memory bandwidth consuming pods, like workers, can impact the performance of other pods on the same NUMA node, such as parameter servers. Allocating these pods to different NUMA nodes can mitigate such interferences.

Currently, Kubernetes supports inter-pod affinity and anti-affinity at the node level. However, there is a growing need to extend this support to the NUMA level.

For example, in tensorflow training, high memory bandwidth consumption of workers will affect the parameter server on the same NUMA node. Assigning these pods to different NUMA nodes can mitigate this interference.

expected harvest

  1. Experience real open source projects, become familiar with the operating procedures of open source communities, and accumulate practical experience in development
  2. Participate in community meetings, communicate with open source enthusiasts, and learn about community dynamics
  3. Project mentor one-on-one tutoring, face-to-face Q&A
  4. Excellent contributors who complete the project will also receive a community incentive bonus of 5,000 yuan (equivalent Jingdong card)

Participation requirements

  1. College students over 18 years old
  2. Love open source culture and accept open source collaboration model

*If non-college students are interested in the topic, you are welcome to participate in the community to build and develop it~

Participate in the process

  1. Choose 1 issue from the following GitHub issues

    1.   https://github.com/kubewharf/katalyst-core/issues/216
    2. https://github.com/kubewharf/katalyst-core/issues/217
    3. https://github.com/kubewharf/katalyst-core/issues/220
  2. Send resume + topic proposal to relevant contacts

    1.   Contact person: Mr. Tang
    2.   Email: [email protected]
  3. After passing, the project mentor will contact you to communicate specific development tasks and start development.

  4. After completing the task, you need to write an article about your experience and feelings of participating in open source projects, and publish it on a third-party community (InfoQ/CSDN/Zhihu/Open Source China, etc.) or campus blog

Activity time

  • Registration time: September 1st - September 14th
  • Notification of selection: September 15th
  • Development time: September 16th - October 30th
  • Article publication time: before November 10th
  • Outstanding Topic & Student Announcement: November 10th - November 15th

If you have any questions, please contact Bytedance Cloud Native Assistant:

picture.image

Please indicate [name+company/school+title]

Fellow chicken "open sourced" deepin-IDE and finally achieved bootstrapping! Good guy, Tencent has really turned Switch into a "thinking learning machine" Tencent Cloud's April 8 failure review and situation explanation RustDesk remote desktop startup reconstruction Web client WeChat's open source terminal database based on SQLite WCDB ushered in a major upgrade TIOBE April list: PHP fell to an all-time low, Fabrice Bellard, the father of FFmpeg, released the audio compression tool TSAC , Google released a large code model, CodeGemma , is it going to kill you? It’s so good that it’s open source - open source picture & poster editor tool
{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/6210722/blog/10106821