Guo Wei: How to become an open source hero

On June 1, 2023, Apache SeaTunnel, the first Chinese-led open source data integration tool, officially announced that it had graduated from the Apache Software Foundation Incubator and became a top-level project. After 18 months of incubation, this project has finally come to fruition. It has 200+ community contributors, 245,000 lines of code, and is used by thousands of companies. It has great success. It is unimaginable that when Guo Wei first took over it, the warehouse was closed and the contributors were scattered.

 

Heroes take action, SeaTunnel is reborn from the ashes

SeaTunnel, formerly known as Waterdrop, was created by LeTV in 2017 and open sourced on Github in the same year. It is a big data integrated processing platform. At that time, various domestic data engines were surging, but few projects solved the problem of seamless integration and high-speed synchronization between data sources, so Waterdrop stood out among them. Unfortunately, this brilliance has brought disaster for it - the trademark of the open source project Waterdrop has been registered, and the other party's legal affairs has also sent a lawyer's letter to the initiator of the open source and Github.

Since the name of the open source project is not a [trademark], and domestic trademarks are subject to the [first to apply] principle, whoever applies first gets it first, so Waterdrop

Suddenly unable to defend itself, Github was at a disadvantage. After receiving the lawyer's letter, Github sealed the entire Waterdrop warehouse on Github. All code, PRs, and Issues were also inaccessible, and Waterdrop's founding team was also facing litigation and disputes. There was no other way, so the team had no choice but to ask for help in the circle. By chance, they met Guo Wei.

Guo Wei, known as "Guo Daxia", usually loves to help everyone in the open source circle. After the founding team found Guo Wei, Guo Wei saw such a thing and such a promising project. He couldn't bear to stand by and watch, so he took over the project. While finding a lawyer to resolve the legal dispute, Guo Wei used his own resources to contact Microsoft. The Github administrator explained to help the project unblock.

In 2021, the controversy finally came to an end, and Waterdrop was renamed SeaTunnel and continued to operate. But the hero is not at ease. After all, the team only has three people. It is already difficult to maintain the community. How can we still take care of legal compliance? If this plot were to happen again, it wouldn't be a joke. Guo Wei began to serve as SeaTunnel's Mentor, leading the open source project step by step, and committed to incubating the project into the Apache Foundation. On the one hand, the foundation is professional and has dedicated personnel to manage legal affairs, which is much better than the current grass-roots team. On the other hand, SeaTunnel can also replace the retired Apache Sqoop to solve the problem of data connection between data sources.

Finally, with the help of many mentors, SeaTunnel officially passed the voting resolution of the Apache Software Foundation on December 9, 2021, and successfully entered the Apache Incubator, becoming the first data integration platform project born in China in the foundation. The goal It is "connecting all sources and synchronizing like flying".

Today, SeaTunnel has graduated from the Apache Software Foundation Incubator and become a top-level project. It also has many corporate users and developers around the world, and has long since said goodbye to its initial embarrassment. Guo Daxia also waved gently and rushed to the next problem.

 

Daxia was also an open source "xiaocai" back then.

The SeaTunnel team initially turned to Guo Wei because he had successfully run multiple open source communities and was already well-known in the industry. Going back to 2010, Guo Wei started to get in touch with open source. At that time, he was a "diver" in the Hadoop community. As a newbie, he watched various experts exchange technical issues and give "cool" solutions. "In the open source community, you can see many brand-new projects and new technologies. You can constantly learn new things and stay at the forefront of the technology circle. This cannot be replaced by other channels. The things in books are too outdated. Well, things on the Internet are very complicated. Only in the open source community can we purely understand new technologies and understand what the open source circle is paying attention to."

Of course, "e people" like Guo Wei will not sit on the sidelines forever. He quickly integrated into the community, often participated in offline meetups, and contributed to many documents. But in the open source community, the name Guo Wei is "nobody". After working at Lenovo, Guo Wei continued to insist on open source and brought open source to Lenovo. In the Lenovo COC Core Technology Architecture Committee, Guo Wei, as the head of the global big data platform, has been an open source evangelist and promoted the application of open source technology. Many colleagues fell into the trap because of his propaganda.

But at that time, it was also difficult to do open source evangelism within the company. First of all, open source was not as popular as it is now. Many people knew very little about open source, and their only impression was that it was “free.” Secondly, companies that are accustomed to commercial software are more likely to stick to their original choices. After all, although commercial software is charged, there is someone who will provide after-sales service and take responsibility. And open source software, although free, has risks. If there are problems, who will solve them? Especially for large global companies, open source may still have legal risks in the local area. Even if it is a "saving money" decision, it is not easy to make a decision.

Guo Wei said frankly that promoting open source within a large enterprise means taking on a lot of responsibilities. To put it bluntly, you have to memorize this problem at the beginning before some key business users can use it. Only when they feel good about using it can you suppress their doubts and talk about the next step of promotion. At that time, in order to promote Hadoop and Spark and comply with the legal regulations of various countries, Guo Wei had to hold meetings with colleagues from all over the world. The meeting was scheduled from six o'clock in the morning to two or three o'clock in the night, and he explained to everyone over and over again what this project was, how to use it, and its results. What to do about the problem, whether it is compliant, why to use it... After a battle with the business department to "pass five hurdles and eliminate six generals", the last American department took the lead in accepting Spark. Later, because of its good reputation, it It was gradually extended to other countries and other departments.

"Every user in our open source community is very precious and hard-working, especially those who do internal promotion for new open source projects that have just come out. Everyone is a warrior. They promote a project within the enterprise New technologies not only require a lot of work, but also use their hats to preach and protect the community. Therefore, when we pay attention to the open source community, we cannot only see contributors, committers, and PMCs, but also see the people in our community. Ordinary users, their hardships and difficulties." Guo Wei said, "In fact, everyone who uses open source is a contributor to this community. They have made a lot of contributions, but they are not reflected in the code."

 

From open source User to Owner, it’s not easy to be a hero

In 2016, Guo Wei joined Analysys as CTO (Chief Technology Officer). At that time, the company was working on a user behavior analysis product, which mainly relied on Presto to make secondary modifications to adapt to the scenario. One day, Guo Wei was wandering around the Internet and suddenly found a new project that was somewhat similar to his own product. So I tested it and found that it was 10 times faster than my own product! Guo Wei was shocked all of a sudden.

This project is ClickHouse, a columnar storage database ( DBMS ) open sourced by Russia's Yandex in 2016. It is mainly used for online analytical processing queries ( OLAP ) and can use SQL queries to generate analytical data reports in real time.

Guo Wei considers himself to be a "pioneer expert" in the data technology circle, and he always pays attention to various researches, but even so, he has never heard of this project, and the probability that others will know about it is even lower. How can you resist sharing such a good thing? Therefore, Guo Wei contacted Ivan, ClickHouse’s global community leader, and offered to help run the community in China. ClickHouse agreed.

However, everything is difficult at the beginning, and it is even more difficult to build a new open source community from 0 to 1. No one knows who you are and no one wants to use you. Guo Wei interviewed early Kuaishou and Sina users and formed a community group. But this first group took a year and a half to gather. There are even fewer people in the offline community. At the first ClickHouse meetup, only 11 people came.

Since this is a personal hobby, you have to do all the various operational activities of ClickHouse yourself. Daily group building, verification, Q&A, guidance, etc. are all completed after get off work and weekends. 11 o'clock every night is Guo Wei's ClickHouse support time. At the beginning, I had to go to each group to teach everyone how to use, install, and configure ClickHouse. On weekends, we also need to find some key users, chat with them, have dinner with them, organize them, invite them to participate in offline games, etc.

" Doing open source is not just about giving a lecture at various conferences. Behind the highlights of open source evangelists are actually countless daily trivial matters. It is very cumbersome to run a community well. For example, if someone posts an advertisement in the group, you have to He kicked him out; someone was quarreling inside, how do you maintain it? Someone threw rotten eggs to the community, how do you judge whether it is a problem with the open source project? If there is a problem with the project, how can we accept it with an open mind? These are all maintained in the community Only by accumulating bit by bit over many years can we really do a good job in the community." Guo Wei said, "Look at Craig, the former chairman of the board of directors of Apache. Top bosses like him are all over 70. At 20 years old, you are still working as a secretary in the foundation to create Apache accounts for everyone. You will know how trivial community operations are. It is the same everywhere."

Fortunately, Guo Wei is not fighting alone on this road. As the number of ClickHouse users increases, the community team also grows. When the WeChat group reached 10, Guo Wei began to recruit volunteers to help handle group affairs. Offline meetups can be held in a meeting room in the company with only one or twenty people at the beginning. Later, it grew to 200 to 300 people offline and more than 1,000 people online. Ordinary venues could not accommodate it, so Guo Wei went around to find friends to borrow venues, and then spent his own money to fly there to organize the event. There was once a Meetup in Shanghai. More than 300 people signed up, but they could not find the place for the Meetup. Jin Hai, the head of big data at Qutoutiao at the time, asked the company to help provide a hotel with a stage, a large screen, and 4 A session, just like the open source conference. There is also Liu Wencheng, who was at China Literature Group back then and was the Little C of ClickHouse, helping to answer various questions. With the help of these contributors, the ClickHouse Chinese community finally held a formal meetup.

Three years later, in 2019, ClickHouse exploded in popularity. As of now, ClickHouse is still the community with the most users in OLAP. In the entire community, Chinese users also have the largest number. Enterprise users such as Toutiao and Alibaba have also joined. At this year’s meetup, the community invited Alexey Milovidov, founder of the ClickHouse open source community of the Russian Yandex company. He said: “The number of ClickHouse users in China has achieved such explosive growth (users increased fourfold in one quarter). Start William (Guo Wei)’s promotion in China.”

 

Success does not belong to me, there are more failures

Guo Wei is very happy to be recognized by the founder of ClickHouse. However, he still feels that ClickHouse has reached its current level not so much because of him as a promoter, but because of the excellence of the product itself and the support of Chinese open source partners. "In the field of data and big data, China's acceptance and use of open source are the fastest in the world, faster than the United States. This is due to the development speed of China's Internet and the use of a large number of Internet companies . Maybe the ceiling of open source business is not as high as that in the United States, but China is rolling up faster. China can often quickly accept a new technology, and then quickly roll it out and iterate quickly. In addition, China has a large developer and user base, and it can start from scratch. Open source has unique advantages.

Looking back now, Guo Wei himself and his friends have benefited a lot in the past four or five years. The friends who worked with him to change the code in the community have now seen their salaries increase four to five times. One of the volunteers, Little C Liu Wencheng, was selected by Tencent and moved from a small factory to WeChat to maintain ClickHouse. "Everyone is for me, and I am for everyone. The contribution you make in the community can be seen by everyone. If your technical level is recognized by everyone, then you will naturally get more opportunities than others. I think This is the charm of the open source community. Everyone is equal here, and gold will shine quickly. This can also be regarded as a reward for community contributors. But this reward is not monetary, but other people's. In recognition of you and your influence." Guo Wei said.

Of course, not all open source projects are as lucky as ClickHouse. Heroes will also encounter setbacks, and more open source projects they operate fail. There are many open source projects that have been operated personally for two or three years and only have a dozen or so stars. How can it be so easy to succeed if you start an open source project yourself? "It's okay to make mistakes. The more mistakes you make, the more experience you will accumulate. You see, the reason why I am successful in making products now is because my failed experiences are at work. The same goes for other things." Guo Wei spread his hands. "Because everyone's success has a special background and needs at that time, so the experience of success is not important, and the experience of failure is more important. It can guide you how to avoid making mistakes. So behind every success, there may be There are 99 failures, but everyone can only see the one success in the end."

After countless failures, Guo Wei also developed his own perspective on projects. " I think the most important thing in building an open source community is to identify the positioning of the product: what problem does it solve and what kind of technical framework does it use? If you are really optimistic about the development of this community, go inside and grow with the community. Okay." Guo Wei said, "It doesn't matter if the product has bugs. Every community is not perfect. ClickHouse also had various problems at the beginning, but as long as you set the big structure, the remaining details will be based on this foundation. Just iterate and improve. What ClickHouse actually solved at the time was the problem of wide tables and log queries, that's all. Then it put the latest technology at the time - vector calculation, directly into the engine, and it was faster than me. The original Presto is ten times faster. It solves this problem and solves it best, so it can develop well in the community."

After identifying the product idea, underlying logic and founding team, the only thing left is to persevere. “I introduced ClickHouse to China when it was first open sourced in 2016. It was unknown at that time, and it didn’t become popular until 2019. In the past few years, it was all about getting by. You have to believe in your vision and continue to persevere. , Don't give up halfway. Sometimes whether an open source community can succeed in the end depends on whether you persist long enough." Guo Wei said, "When the community really grows and its influence is large enough, every small person in it will All partners will benefit.”

 

Multiple identities, balancing between open source and business

In April 2022, Ted Liu (Liu Tiandong) suddenly came to inform Guo Wei: We nominate you to be an Apache Software Foundation (ASF) Member. Please write a material! In this way, Guo Wei became a member of the Apache Foundation. "When I received this honor, I was very happy. I felt that this was everyone's recognition of me. At the same time, I felt that I had a heavier responsibility and was more motivated to inspect and maintain every Apache project."

In 2023, Guo Wei has another identity, the CEO of Beluga Open Source. Very few people are members of a foundation and leaders of commercial companies at the same time. Does Guo Wei feel that there is a conflict? When making decisions, should you consider open source or commercialization first? If there is a fight between open source and commercial functions, wouldn’t it be difficult for heroes to handle it?

However, Guo Wei is very calm about this. He believes that open source and commercialization do not conflict and even complement each other. If an open source project wants long-term stability and sustainable development , commercialization is probably inevitable. If there are no commercial companies to undertake the support for core developers and contributors and meet the needs of deep users, over time, core contributors who rely purely on love to generate electricity may not be able to continue.

"Companies that commercialize (Apache SeaTunnel and DolphinScheduler) like Beluga Open Source are not the opposite of open source, but are promoters of open source." Guo Wei said, "Business can better preserve the tone and core contributors of open source. job, allowing them to continue to work on open source. Similarly, for some in-depth users, when open source projects cannot fully meet their needs, or they need someone to help them promote within the enterprise, there is a commercial entity to come to them. If you help him do this together, the evangelist will feel more relaxed, instead of having to fight against the crowd of scholars alone like I did at the beginning, passing five levels and killing six generals."

However, the reason why open source projects are difficult to commercialize is precisely because the code is made public. How to choose between business and open source? Which ones should be open source and which ones shouldn’t? When encountering a conflict, how should you make a decision?

Guo Wei smiled and showed a clever "cake-cutting method": "First of all, from the perspective of product positioning, you have to separate your main open source user group from your main non-open source user group - if the technical level is very strong, And if you still have time and budget to mess around, then use open source. If you don’t have enough time and the human budget is insufficient, it is more worry-free to use the commercial version. Therefore, the users of the two are different. Your open source The positioning of software and commercial software is also different. Once you understand this, you won’t have so many problems.”

By convention, the latest features will be placed in the open source version, while relatively stable and industry-specific features are usually placed in the commercial version, and the two sides will exchange their needs from time to time. All Guo Wei has to do is to grasp the time and rhythm of placing functions on both sides. "As for which functions should be put into the commercial version and which functions should be put into the open source version, this is a question of how to cut it with a knife: if you cut too little, your commercial version will have no value; if you cut too much, it will affect the community. So what? To master it, it is an art rather than a technique, which can only be imagined but cannot be expressed in words (laughs~)”

Generally speaking, Guo Wei is very optimistic about China's open source business environment. After all, China has a high degree of acceptance of open source. Although everyone is still exploring the path from the open source community to commercial companies and commercial products, at least the new generation of decision-makers Guo Wei has come into contact with are different from the past: they understand that open source will bring the company's technology into line with international standards. Integrate with the latest technology in the world. Both traditional companies and Internet companies are gradually trying to use open source native commercial software.

"The atmosphere and overall pace of China's open source business is awakening." Even if China's open source goes global, Guo Wei also feels that there is great hope: "After all, China has such a good soil, especially in the field of big data, there is so much data , terminals, scenarios, performance... The project that is rolled out must be among the top in the world, and ultimately combined with overseas business scenarios, it will definitely sell well."

 

Open source veteran, brewing the next community

In the open source circle, Guo Daxia also has his own idol: "Craig has set an example for me. He is over 70 years old and still insists on contributing to open source. I feel that I can continue to do open source when I live to be 70 years old. He is me." An example. Haha.”

Living and learning may not be just Guo Wei's open source ideal, but at least Guo Wei has persisted until now.

Now, as an open source veteran, Guo Wei is paying attention to the next hot topic - large models, especially open source large models. "I think every software in the future will be transformed and redone by large models and related AI technologies. If we go to incubate projects next, it may be related to large models." Guo Wei said, " If you only train large models, then only a few companies at home and abroad can afford it. However, if companies in the upstream and downstream of the large model ecosystem want to do well, there are still many opportunities. So, what can promote the application of large models and reduce the cost of training? The threshold for using large models, making large models really run, especially the relationship between large models and data, will be my focus."

Guo Daxia packed his bags and rushed to the next challenge.

I wonder what kind of story he will encounter next?

 
 
[Tracing to the Source] In every conversation, trace the story about open source and get to know those geeks, free and persistent open source people.

The open source character interview column [Tracing the Source] launched by OSCHINA.

Traceability means tracing back to the source and solving open source problems. Ask the canal where you can get such clear water, because there is a source of living water. Every open source participant is the most vivid source of the open source wave. All open source stories work together to build the open source world we see today.

In the decades since open source first appeared, hacker groups working for open source have suffered from indifference and rejection from the mainstream of society. Even though the current software industry has shouted the slogan "Embrace open source", problems still exist.

We don’t know how many obstacles open source contributors, open source evangelists, and everyone involved in open source will face, but what gives us confidence is that more people are devoting themselves to the cause of open source.

Therefore, OSCHINA hopes to target the developer community, find everyone who actively participates in open source and has ideas about open source, understand them and their open source stories, and gain a glimpse of the development patterns of open source careers in the stories.

[Tracing the Origin] Series of articles:

01  Shisi  : Become an open source evangelist

02  Wei Jianfan: A “foreign master” in the open source circle

03  "Tool Man" Zhao Shengyu: Bachelor's degree in Qingbei, resigned from Alibaba to work on open source and went to Tongji to study for a Ph.D.

04Wu  Sheng: For me, social interaction is the most important thing in open source

05  Wukong Liu Qi: Technical flaws will not be removed quickly, the open source community code will speak for itself

06Jiang  Ning, taking programmers to the open source "Utopia"

[Tracing Source] The column is collecting stories of open source people. If you think you or someone around you has made a unique contribution to open source, please leave a comment and let us hear their story.

IntelliJ IDEA 2023.3 & JetBrains Family Bucket annual major version update new concept "defensive programming": make yourself a stable job GitHub.com runs more than 1,200 MySQL hosts, how to seamlessly upgrade to 8.0? Stephen Chow's Web3 team will launch an independent App next month. Will Firefox be eliminated? Visual Studio Code 1.85 released, floating window Yu Chengdong: Huawei will launch disruptive products next year and rewrite the history of the industry. The US CISA recommends abandoning C/C++ to eliminate memory security vulnerabilities. TIOBE December: C# is expected to become the programming language of the year. A paper written by Lei Jun 30 years ago : "Principle and Design of Computer Virus Determination Expert System"
{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/6852546/blog/10320168