Cloud Native Characters | Pulsar Di Jia: The trust of the community is the most important

Cloud Native is everywhere. "Cloud Native Characters" is a series of original interviews launched by CSDN. We pay attention to every technical person and company in Cloud Native. We know the micro and see, see the value and trend of cloud native.

Edit | Song Hui

Produced | CSDN cloud computing
head map | Paid download at Visual China

In this issue, we interviewed Zhai Jia, PMC, co-founder and CTO of StreamNative, the top Apache project Pulsar, to hear about his open source and cloud native entrepreneurship. By the way, I wish readers and friends a happy Chinese New Year.

About technical people

CSDN : Please introduce your technical experience first?

Zhai Jia: Hello, everyone. I’m Zhai Jia. I am currently the co-founder and CTO of StreamNative. I am also a PMC member of two Apache Software Foundation top projects, Apache BookKeeper and Apache Pulsar.

Since graduation, I have always adhered to the same technical direction. At the postgraduate level of the Institute of Computing Technology of the Chinese Academy of Sciences, my main job is the development of storage and file systems. After graduation, I joined EMC and then worked on file system and distributed design and development. During EMC, I was fortunate to have access to and use Apache BookKeeper.

BookKeeper is a distributed storage engine with low latency, high performance and strong consistency based on WriteAheadLog. The EMC project I am responsible for is based on the container environment, using BookKeeper as the basis of stream storage and providing it to computing engines such as Flink. BookKeeper is a simple abstraction based on WriteAheadLog, but the rich application scenarios touched me a lot.

Therefore, after leaving EMC, he continued to work on the design and development of Apache Pulsar based on BookKeeper. In the process of continuously participating in and contributing to the two open source projects of Apache BookKeeper and Apache Pulsar, I gradually became a member of the project management committee of Apache BookKeeper and Apache Pulsar. In recent years, in addition to focusing on technology, I have also been doing open source evangelism and promotion in the community, hoping to let more developers understand the advantages and features of BookKeeper and Pulsar.

CSDN :Which areas of technology and trends are you paying attention to recently?

Zhai Jia: In addition to the open source community, stream storage, stream computing, batch stream integration, cloud native, etc. are the directions I have been focusing on for a long time.

About Pulsar, cloud native

CSDN :What do you think of the development of cloud native in China?

Zhai Jia: The concept of cloud native was first born in 2014. Later in 2015, the Linux Foundation established CNCF to promote the development of cloud native. In the development of computers, from mainframes to minicomputers, to distributed clusters and cloud computing, it is a process that gradually improves computing efficiency and reduces the difficulty of application development and operation and maintenance. What cloud native brings is the pooling of computing resources, the ecologicalization of resources, and the standardization of resources. More and more people have gradually recognized that cloud native is the future development direction of cloud computing.

my country is the world's largest single market, with advanced big data and Internet scenarios and demands. This is the driving force for the development of cloud native technology, and it also brings huge opportunities for the development and prosperity of domestic cloud native technology.

CSDN :What is the role of Pulsar in the cloud native field?

Zhai Jia: Apache Pulsar was open sourced by Yahoo in 2016 and graduated in September 2018 as a top project of the Apache Foundation. With the increasing number of community users, Pulsar's functions and ecology are constantly enriched and improved, and now it is also the period of rapid growth of Pulsar.

Apache Pulsar has proactively adopted a cloud-native architecture with separation of storage and computing and hierarchical sharding since its design in 2012, which greatly alleviates the difficulties of expansion and operation and maintenance encountered by users in the messaging system; and it adopts specialized The storage system designed for messages and streams provides reliable read and write service quality and consistency guarantee for important scenarios-this is also the reason why we say Apache Pulsar is a cloud-native distributed messaging platform.

Pulsar is a messaging platform located in the cloud native ecosystem. Pulsar community now has a lot of work to make it easier for users to take advantage of cloud native and use Pulsar as the basis of messaging services in a cloud environment.

CSDN :How to balance the community version and the commercial version? What do you think is the biggest difficulty?

Zhai Jia: The open source community is the foundation of StreamNative's development, so all the work related to Pulsar is open source, and there is no difference between the open source community version and the commercial version. When a new version of Apache Pulsar is released, StreamNative will merge the team's new submission (Pull Request) back into the Apache Pulsar project. StreamNative and community users use the same code. As the maintainer of the Apache Pulsar community, StreamNative provides enterprises with the most professional hosting and operation and maintenance services based on Apache Pulsar in public cloud, private cloud and hybrid cloud scenarios.

When it comes to difficulties and challenges, we encounter different challenges at different stages. In the initial stage of the community, when Pulsar was open sourced, it has been running on a large scale and stably inside Yahoo! for a long time. It has many features such as architecture, data reading and writing service quality, and so on. We need to let everyone know about these excellent places of Pulsar. Recognize the pain points that Pulsar can solve. For example, the horizontal scalability of the system in other traditional message queue scenarios, and the complexity of system operation and maintenance in the streaming scenario. At this stage, the community goes from 0 to 1, Pulsar's functions and foundation are already available, and community promotion work is needed.

At this stage, Pulsar has attracted many heavy corporate and team users, and solved many of the pain points encountered in business scenarios. This has brought a certain demonstration effect and attracted more and more users to enter the community. It also gave Apache Pulsar brings more application scenarios and functional requirements. At this stage, while doing a good job in community promotion, we must also consider how to sort out the common needs of community users to continuously enrich the functions and scenarios of Pulsar, and consider how to continue to maintain the healthy development of the community.

CSDN :Global and Chinese users use Pulsar, what cases and stories can be shared?

Zhai Jia: This is something we are happy to share with you. With the joint efforts of StreamNative and the community, there are more and more Apache Pulsar landing scenarios, and more and more cases of multiple pounds are also seen.

As the earliest domestic Apache Pulsar landing case, Zhaolian Recruitment gave us great encouragement. Zhaopin Recruitment faces the scalability problem of RabbitMQ internally. It wants to find a message service and expects to meet the needs of expansion and data read and write service quality at the same time as the internal message bus. However, many projects and products have been investigated and failed to satisfy the team. demand.

Later, they first investigated Apache BookKeeper and found that BookKeeper can provide good scalability and data service read and write quality, and they plan to add message function implementation based on BookKeeper. They discovered Pulsar by chance, and felt that this was what they wanted to do. At that time, there was a feeling of "friends meet and hate late". We strongly felt the zhaopin recruitment team's desire and love for Apache Pulsar, and we were also excited that Pulsar could solve the pain points of users. So it hit it off and started the journey of Apache Pulsar in Zhaolian recruitment. Although RabbitMQ undertook the online system of Zhaolian Recruitment at that time, the entire process of migrating from RabbitMQ to Pulsar, from grayscale, online, and completely replacing RabbitMQ, was particularly rapid.

For the two scenarios of messaging, the typical online business case is Tencent's billing platform, and the typical offline data analysis case is the short video application BIGO:

Tencent’s billing platform not only has requirements for system expansion, but also has stringent requirements for data service quality. Tencent’s billing platform uses Apache Pulsar to process an average of 10 billion+ transaction requests per day and consumes 10T+ data per day, which carries Tencent Group’s daily data. Billion in revenue, and the total amount of custody accounts reached more than 30 billion;

The BIGO case is the pain point of cluster operation and maintenance that you often encounter in the Stream scenario. BIGO uses the good integration of Apache Pulsar and the big data ecosystem to build a real-time recommendation and analysis system to facilitate rapid business development and reduce the cost and difficulty of operation and maintenance of the original Kafka cluster, especially the labor cost of expansion and contraction.

In addition, we have many companies that have deployed and used Pulsar for a "long history". As the pioneers of users, such as foreign Yahoo! Japan and Splunk, domestic EMQ, China Telecom, etc. are all old users of Pulsar. Pulsar helped Splunk reduce costs by 1.5-2 times, latency by 5-50 times, and operating costs by 2-3 times; in the deployment of Yahoo!, Pulsar supports the same scale of business volume and is still ensuring higher data services In the case of quality, it consumes only half of the actual hardware resource cost of Kafka. The above companies not only expand the use of Pulsar internally, but also continue to contribute new features to Pulsar and share experience with the community.

At the end of November this year, we hosted the first Apache Pulsar Asia Summit. There are many scenarios from the community at the summit, such as the Internet of Things, securities trading and financial technology, telecom billing, Internet live broadcast, online education, instant retail and logistics, e-commerce, artificial intelligence and other industries. The more abundant it is, it is showing an explosive trend.

Apache Pulsar's own cloud-native architecture design, dedicated message storage engine, cross-regional replication and multi-tenancy and many other enterprise-level features have attracted more and more users to land on Apache Pulsar. In the process of cooperating with community users, we found that everyone has similar experiences: In the face of the pressure of rapid business growth, existing systems are facing many technical pain points, including system expansion and data service quality. The transformation of the existing system consumes a lot of manpower and energy, but the benefits are very small. In the process of turning to Pulsar, I have more and more realized the advantages of Pulsar, and realized the new capabilities and value of Pulsar. Break through the plan at the time.

About open source

CSDN :Pulsar is now a very well-known project. What was the harvest during the incubation period of the Apache Software Foundation?

Zhai Jia: Pulsar was open sourced by Yahoo in 2016, and later donated to the Apache Software Foundation to enter the incubator, and graduated as a top project in 2018. During the incubation period of the Apache Foundation, one of the main goals was to follow the mature mechanisms and processes of the Apache Software Foundation to enable Pulsar to better practice the "Apache Way" at the project and community levels, and to build a fast, healthy and diverse project and community. Foundation. The brand influence of the Apache Software Foundation has also brought some help to Pulsar.

During the incubation period of the Apache Software Foundation project, new projects will mainly accept the "Apache Way" (Apache Way) tutoring, as well as integration with Apache infrastructure, Apache software protocol compliance, and so on. Regarding the "Apache Way", it mainly includes the following principles: winning authority, peer community, open communication, consensus decision-making, project autonomy, independence, and community is better than code. Among them, "Community is better than code" should be the most widely cited one by the Apache community. For the Apache Software Foundation project, a healthy community has a higher priority than high-quality code, and a strong community can fix it. Correcting problems at the code level is why Pulsar strives to build a continuously active community.

CSDN :As the next generation cloud native messaging and streaming platform, what advantages and specific support does Pulsar have in terms of cloud native?

Zhai Jia: Let me talk about Pulsar's advantages and support in cloud native from four aspects.

In terms of resource pooling, Apache Pulsar can support service clusters and storage clusters with thousands of nodes due to the separation of storage and computing and the peer-to-peer architecture. The ability of a single large cluster combined with Pulsar's native multi-tenant management allows managers to use Pulsar as a large-scale message service resource pool for consumers to consume on demand.

In terms of resource ecologicalization, Pulsar implements a physical slicing storage model for Topic on top of logical partitions, and naturally layered processing at the storage layer. In this way, the storage resources on the cloud and the cloud are well integrated, and the layered storage architecture is used to build a unified storage foundation for batch and stream computing.

In terms of lightweight computing, the design of Pulsar Functions matches the concept of serverless architecture. At the same time, with the help of resource scheduling and management tools on the cloud, it can provide users with convenient and direct lightweight functional computing services.

At the same time, Pulsar provides various connectors to facilitate users to connect to other big data ecosystems, allowing users to use Pulsar as a basic messaging service on the cloud more conveniently.

CSDN :How to keep the Pulsar community active? Is there any good way to share?

Zhai Jia: Keeping the Pulsar community active and healthy growth is a long-term proposition. At different stages, there are different tasks and problems to be solved. At present, the interaction within the Pulsar community is quite active, and an active mutual assistance and self-operation community has been formed. For an active community, the most fundamental thing is that the functions and features of the project really bring convenience to community users, create value, and solve everyone's pain points. Otherwise, no matter how much resources are invested, it will only be vain and false prosperity.

In terms of methods, the company must first recognize the important and complementary relationship with the community. Only when the company is fully committed to the community can it bring the trust of the community; with the trust in the community, users will embrace the community, continue to participate in contributions, and propose improvements; so that the company can continue to develop with the community.

In terms of technology, Pulsar has always maintained rapid version updates, continues to develop an integrated ecosystem with other open source systems, and develops new functions and features with the help of community contributors; in terms of community, in addition to StreamNative team members, there are more The participation of community members who love Pulsar, many members have online development experience of Pulsar, have rich experience in deployment and application, and are happy to discuss and solve problems with others.

StreamNative organizes various activities to promote the communication and growth of the community: offline Meetup before the epidemic, online summits during the epidemic, community developer meetings, etc. We recently launched the "Community Ambassador Program" for the Apache Pulsar community, aiming to discover and tap more community partners who are willing to contribute, and build a better Pulsar community; in terms of documentation, we have been digging out representative user cases And continue to follow up to provide references for community partners. We have also been supplementing, perfecting, and improving the Pulsar technical documentation, and encouraging more contributors to participate in the construction of the documentation, so that users' operations and problem solving are based on evidence.

About entrepreneurship

CSDN :Why choose to start a business?

Zhai Jia: I have the same ideals and beliefs as most technical people, I believe in the power and value of technology. Even after the passionate age of "innovative and popular entrepreneurship", my teammates and I still choose to believe that "technology changes the world".

Our founding team members of StreamNative have been involved in the Pulsar and BookKeeper projects since their inception and have accumulated nearly ten years. In the process of Pulsar and BookKeeper undergoing long-term honed and iteration on Yahoo and Twitter online, we have personally witnessed the development, construction, and operation and maintenance of a Pulsar storage cluster with a scale of 3000+ nodes. We clearly realize the advantages of Pulsar in terms of architecture and functions, as well as its fit and consistency with the "cloud native" direction.

In the landing cases of many customer scenarios, we have seen developers' recognition of Pulsar's architecture and products, and also convinced us that the original intention of Pulsar can be realized and can solve the various pain points of users in the message scenario.

In addition, in recent years, the open source commercialization model has continued to mature. Commercial companies are emerging behind open source projects such as Spark, MongoDB, ElasticSearch, and TiDB. We believe that Pulsar and StreamNative have the same opportunities.

CSDN :How many people currently have StreamNative? What is the proportion of research and development? What is the technical culture of the R&D team?

Zhai Jia: We have two teams in China and the United States, and now more and more outstanding friends join in. The current team size exceeds 35 people, of which engineers account for about 80%. Compared with the more mature and stable team, we are still in the rapid development stage.

The company uses OKR as the measurement standard and supports flexible office and remote office work for all employees. Domestic partners are distributed in Beijing, Shanghai, Hangzhou, Tianjin and other places. Everyone uses excellent online collaboration tools such as Slack, GitHub, Zoom and Google to communicate daily And collaboration, promote asynchronous communication, and pursue the highest efficiency of communication and problem solving. The team members all have rich experience in software development and open source project contributions, and they are relatively self-driven. Everyone highly agrees with the open source spirit, agile culture and result orientation. The R&D team is a flat typical engineer culture.

CSDN : In2021, what are the plans for Pulsar and StreamNative?

Zhai Jia: We are full of expectations for 2021. In terms of Apache Pulsar community and ecology:

  • We plan to usher in the release of the Pulsar 2.8.0 major version in the first quarter of 2021, when more blockbuster features will be launched; we will also continue to add more features in conjunction with community contributors to enhance the integration with other open source big data ecosystems. Fusion

  • We will continue to enrich the regions where the annual Pulsar community summit is held globally. With the improvement of the epidemic, we will also organize Pulsar offline activities in more cities in China and actively interact with other open source communities;

  • We will continue to build a healthy, diverse and vibrant Pulsar community and enhance the interaction within the community. Enhancing the vitality of the community through ambassador programs, community developer meetings and other forms, allowing more contributors to participate in the construction of Pulsar through various forms;

  • We will invest more energy in the construction and improvement of the Pulsar Chinese community, and continue to improve and strengthen the construction of Pulsar documents in multiple languages;

  • As more and more users pay more attention to Pulsar, we will provide more workshops and trainings to lead everyone on the pulse journey.

In the plan for StreamNative, we will continue to improve StreamNative's cloud products, support the deployment of Pulsar in more domestic and foreign cloud service vendors, and private cloud hybrid cloud environments, optimize product experience, and continue to maintain full commitment with the community , To promote the development of the Pulsar community. In addition, with the increase of many paying customers in Europe and the United States, the size of the team is also growing. We are absorbing all kinds of talents, and we welcome more like-minded friends to join us.

More reading recommendations

Guess you like

Origin blog.csdn.net/FL63Zv9Zou86950w/article/details/113667179