Introducing our open source "charging journey" - exclusive interview with two new Apache Flink Committers

This article comes from an exclusive interview with Fang Yong and Hu Weihua from the ByteDance streaming computing team. The two students mainly contributed to the Apache Flink community including Runtime Coordinator, Streaming Warehouse and other related features. He was officially invited to become an Apache Flink Committer in July 2023.

In the world of software development, open source has become a widespread topic. More and more companies and developers realize the importance of open source and begin to actively embrace and contribute to open source. Since 2017, the ByteDance streaming computing team has begun to try to use Apache Flink as a streaming computing engine, and has gradually increased its attention and investment in the open source community.

In the past two months, two students from the team, Fang Yong and Hu Weihua, have been invited to become Apache Flink Committers. This article will conduct an exclusive interview with two new Committers on their journey of participating in open source.

My path to open source participation

Apache Flink is a high-performance distributed computing framework that has become the de facto standard for streaming computing and has largely promoted the development of streaming data processing. For the two new committers, Flink is a star project that cannot be ignored in Apache.

As a very active community, questions raised by users are answered quickly (basically within a day), and the user experience is very friendly. At the same time, community members are also very professional, ensuring the technological advancement of Flink. In addition, Flink has also expanded a wide range of application scenarios based on stream computing. Flink-based stream-batch integration, OLAP, and Streaming Warehouse have all been implemented in Byte.


Flink is a very powerful and flexible computing engine. ByteDance uses Flink to support many business scenarios. As a Flink Runtime R&D engineer, as I learned more about this project, I became more and more aware of the advanced nature of its design concept, and I also came up with the idea of ​​actively giving back to the community. Therefore, on the one hand, I subscribe to community-related emails and actively answer questions from other developers in the community; on the other hand, I work on Flink’s scheduling and resource management fields, and gradually share some of my internal optimization experience back to the community.

In the process of participating in the community, I have mainly made the following contributions to the community:

  1. Actively answer users’ questions and doubts to help them better understand and use Flink;

  2. In terms of Flink scheduling and resource management, actively contribute code to improve scheduling performance and reduce maintenance costs.

As I continue to participate in the community, I was honored to be invited to become an Apache Flink Committer in August this year.

My current energy in the Apache Flink project is mainly focused on Runtime Coordinator related work. In this regard, ByteDance still has some customized development internally, and we will also actively give back to the community. In the subsequent Feature development, we will also give top priority to integrating into the community and actively contribute to the community.


When you start to contribute to the Flink community, the biggest challenge is to find an issue that suits you. At first, I often followed the development email group. When I received a new issue email, I would immediately check whether I could familiarize myself with or solve simpler problems. Then soon @ community PMC or Committer will help distribute it. Sometimes I also browse the community's Jira list to see which issues I think can be solved and add them to my issue list. After submitting the PR, I will continue to @ people who can help review it. Sometimes when I'm waiting for the CI results, I find out the next day that a PMC has already helped with the review, and it turns out it's still a big shot in the community like Till. Although I was not familiar with the community leaders at that time, they were all very friendly and willing to accept new people. As I continue to accumulate in the community and gain an in-depth understanding of the Flink system, I will find more issues that can be optimized. I will also submit issues and even FLIP to communicate with more like-minded partners.

In the Flink community project, my main investment is to promote support for Streaming Warehouse-related Features, including JDBC and Gateway access, and the implementation of Flink OLAP-related Features. In addition to research and development, I have also invested a lot of energy in discussions and Q&A in various email groups in the community. In this process, I can have better communication and exchanges with classmates of different nationalities, companies, and backgrounds in the community. At the same time, Understanding how users from other companies use Flink will provide some inspiration for our subsequent work.

In addition to the Apache Flink community, I am currently participating in the Apache Paimon community. At present, in addition to encouraging everyone to participate in the Apache open source community and contribute relevant issues solved internally to the community, our team is also cooperating with the Paimon community to promote Streaming Warehouse's data lineage management, streaming computing backtracking and revision, and streaming-batch integrated ETL. Major feature development in terms of consistency and other aspects.

Participating in open source is also a "recharging journey"

Hu Weihua:

I have always believed that participating in open source communities has a positive impact on individuals, teams, companies, and communities: For individuals, they can improve their technical level and broaden their ideas for solutions through discussions with other outstanding members. For teams, it can promote innovation and development and avoid working behind closed doors. Especially for teams like ByteDance that use the Flink engine, they need to be deeply involved in the community. For companies, participating in open source communities can enhance their brand image and technical prowess. The more people participate in the community, the more conducive it is to the development of the community and the solution of user problems.

Taking my personal experience as an example, in the process of promoting performance optimization of large jobs, we adopted a batch deployment solution and made major changes to the deployment process of Flink tasks. However, after many in-depth discussions with other members of the community, we decided to shift the optimization direction to increasing relevant caching on the TaskManager side. This can not only achieve optimization purposes, but also greatly simplify modifications to the original process. This gave me an insight into how communities work and how powerful they can be.

Participating in open source has allowed me to gain a lot in terms of technical skills and thinking expansion. In terms of technology, I can learn a lot from professional Committers and PMC partners, and I have gained technical growth through repeated exchanges and rigorous CodeReviews. In terms of thinking expansion, I saw more business usage scenarios in the process of answering questions from community users, which expanded my thinking.

Fang Yong:

Participating in open source is also "recharging" for me. In summary, there are the following major power sources:

  1. In this process, I gained a deeper understanding of the operating mechanism of the open source community, and became more familiar with how to encourage other team members to participate in the community and promote internal functions into the community process;

  2. During the community exchange process, I got to know more partners in related fields, which facilitated communication and understanding of the current progress and technical direction of the industry;

  3. You can obtain the technical direction and core functions currently being promoted by the community and open source systems in a more timely manner, and understand the background and solution selection of relevant functions, which will help to better promote the technical planning and evolution of the system within the company;

  4. The community's thinking on technical solutions, technical implementation and related issues pays more attention to rationality and scalability than internal ones. These can also continue to accumulate within the team and promote the team's technical growth.

Tips from “pioneers”

Hu Weihua:

What I suggest is to be bold and thoughtful and actively participate.

Being bold means having the courage to express personal opinions in the community. Friends in the community will actively listen to your opinions. When your opinions are adopted and discussed, you will feel a full sense of accomplishment. In terms of carefulness, in the community, most friends participate in their spare time and mainly use asynchronous communication methods. A complete and clear speech can greatly reduce communication costs.

Active participation is not limited to submitting code. Community discussions and user Q&A are also good opportunities for growth.

Fang Yong:

There are two main pieces of advice: stay passionate and stay invested.

The open source community is very open, and you can actively participate in issues that you are interested in or understand. Provide personal opinions or practical experience to help promote the solution of related problems or feature design.

At the same time, participating in the community is a very long-term matter. You cannot get very good returns by participating in a month or two. It requires continuous investment, whether it is participating in mail group discussions or developing issues. As long as you can adhere to the above two points, while helping the community to build a better community, I believe you will also be able to make great improvements in technology and influence.


From August 18 to 20, 2023, ApacheCon Asia will be held at the Park Plaza Hotel in Beijing. Li Benchao from the ByteDance streaming computing team and Apache Flink Committer will participate in the Keynote speech "Is Open Source Contribution Difficult?" 》, share the experience and gains of participating in open source contribution. Welcome everyone to pay attention.

ByteDance's streaming computing team is responsible for ByteDance's internal streaming computing application scenarios, supporting many core businesses including machine learning platform/recommendation/data warehouse/search/advertising/streaming media/security and risk control. It mainly solves the problems faced by ultra-large single jobs (tens of millions of QPS) and ultra-large cluster scale (tens of thousands of machines) application scenarios, and has in-depth optimization of Flink in SQL, State&Checkpoint, Runtime and other directions.

In 2022, the computing engine " Streaming Computing Flink Version " product supported by the team will be launched on the Volcano Engine , officially providing cloud computing capabilities to the outside world.

Guess you like

Origin blog.csdn.net/weixin_46399686/article/details/132628386