How Engineers Treat Open Source

How engineers approach open source

This article is written by the author as an engineer who has been engaged in open source related work in a well-known technology company for more than 20 years. I have personally experienced or witnessed many engineers’ excellent practices in dealing with open source software, and I have also seen many Bad Cases.

overview

As an engineer performing technical work within a technology company, his job task is to use technical means to support and achieve the company's business goals. In the actual work process, a large number of open source software needs to be actively or passively used and maintained. According to statistics, each engineer will come into contact with thousands of open source software every year when doing R&D and operation and maintenance work within the enterprise. If an engineer uses Java or JavaSciprt as the main program development language, he will come into contact with more open source software, at the level of 10,000 or even 100,000. (Data source: "2020 State of the Software Supply Chain" published by Sonatype)

So how to choose open source software? Among so many open source software, how to choose a suitable open source project for investment according to personal needs and business needs needs to be considered comprehensively.

How to customize and maintain long-term after choosing open source software? This is also a big problem. Because developing software within an enterprise is different from developing software by individuals, the cost of maintaining a computer software system is far greater than the cost of developing the system or software. After choosing open source software, how to customize and modify from a long-term perspective, and how to carry out follow-up long-term maintenance, can achieve efficiency and save costs. There are many good experiences in the industry, and there are also many less successful cases that have become lessons.

Finally, back to the individual, the growth of engineers is carried out through continuous learning and practice. How to use open source to improve one's own ability, expand one's horizon, and improve one's technical reputation and influence in the industry is also very important for engineers themselves.

This article will be elaborated from the following three parts:

  1. How Engineers Choose Open Source Software
  2. How Engineers Customize and Maintain Open Source Software
  3. How Engineers Can Leverage Open Source for Personal Growth

1. How to choose open source software

First of all, it is necessary to clarify the attitude towards open source software. At this stage, it is impossible to leave the use of open source software. There are various risks in using open source software, including open source compliance, security, and efficiency issues. Simplified into one sentence: When using open source software within an enterprise, it is necessary to abide by the enterprise's internal regulations on open source software, including how to introduce and how to maintain it, so as to achieve efficient, safe, and compliant use.

Back to the question of how to choose a specific open source software, there are the following latitudes for reference.

  1. According to demand
  2. According to the trend of technology development
  3. Depending on the stage of the software adoption cycle
  4. According to the maturity of open source software
  5. According to the quality index of the project
  6. According to the governance model of the project

1.1 Choose open source software according to your needs

To choose open source software, you must first clarify the requirements, that is, what is the purpose of choosing this open source software. Engineers choose an open source software, what is it used for, is it used for personal learning; it is used to meet the needs of ToB customers; or it is used to meet the needs of internal service development. Under these three different purposes, the orientation of choosing open source software is completely different. (Note: The latter two scenarios need to consider the requirements of enterprise open source compliance first, see Chapter 3)

Let me talk about choosing open source software for personal learning first, then we need to see what the specific purpose of personal learning is. Do you want to learn a popular technology to improve your technical knowledge structure and expand your technical horizons; or do you want to see the specific implementation of corresponding open source technology projects as a reference for internal project technology development; or do you want to make targeted technical preparations for your next job. Different purposes lead to different choices. For the former, it is obvious to choose which technology is the most popular, and what is lacking; for the second purpose, it is generally to make a targeted selection of well-known open source software or innovative software in the technical field, that is, a certain feature is what I currently need, or my current project is not well implemented, and I need to see how others have implemented it. The last one is obviously to prepare according to the position needs and technology stack requirements of the next job, and to choose according to the threshold level of the technology stack requirements. But note, to choose open source software based on personal needs, you generally need to write a small project to practice your hands, such as a demo program or a test service. Because you don’t need to consider the follow-up long-term maintenance, you can do various exercises according to your personal ideas and personal research and development habits. You don’t need to follow the internal development process and quality requirements of the company, and you don’t need to consider the stability and community maturity of the open source software. You just need to study and refer to the code to your heart’s content.

Then look at the next requirement. The software that chooses open source software for research and development needs to be provided to customers, and it may often be delivered in the form of a private cloud. Choose open source software based on such needs, and pay attention to balance, that is, the needs of customers and the long-term planning needs of the company's own technical planning or products. Entering the customer's IDC environment in the form of a private cloud requires integration with the upstream and downstream projects of the customer's development and operating environment. At this time, it depends on the needs of customers. Some customers may have specific requirements for open source software, such as requiring the use of HDFS and a specific version. The requirement for such specified software name and specified version may be because customers are more familiar with this version, or it may be because of software and versions previously provided by other software and hardware suppliers. The purpose of the designation is to facilitate integration and subsequent use and maintenance. If this demand is in line with the long-term development needs of enterprise projects or products, it can be fully satisfied. If Party A is very strong and has no other way but to meet his requirements, then choose the software and version specified by the customer. However, if it is inconsistent with the long-term development needs of your own project or product, and the specific project or version can be negotiated with Party A, then you need to negotiate with the customer to come up with a mutually acceptable result, that is, to choose a specific open source software and version. You must not only satisfy the customer and pay the bill, but also ensure that your own delivery costs are controllable, and you must also meet the long-term development needs of your own project or product. For example, the customer uses an old version of Java, but the software delivered by the enterprise's toB requires the use of a higher version of Java. Then you need to negotiate with the customer, or switch to the version that the company wants, and help the customer complete the upgrade of the existing system; or you can only reduce the Java version requirements of your own software, and you may need to modify some of your own code, and you may also modify some dependent components in the software. In this scenario, there are many choices under objective constraints, which need to be negotiated with customers, their own product managers and architects.

Finally, if the scenario is to meet the needs of internal services, that is, the services built with open source software are for internal business or end users, which are common in the Internet service systems of major domestic Internet companies and apps on various mobile phones. At this time, the development and maintenance parties of the project have greater autonomy, which is completely different from the delivery business of toB. When choosing open source software at this time, you must comprehensively consider the development and maintenance costs, as well as the stage of the business that uses the service.

(1) If the service provided is for innovative business, the innovative business is generally a trial-and-error business, which needs to be adjusted at any time according to changes in the market situation and the current implementation status. It is likely that the project will disappear after three months, that is, it will be cancelled. In this case, the "rough and fast" development method is more appropriate. You don't need to think too much about the maintainability and scalability of the system. Just use the software technology stack that the R&D team is most familiar with, and then use the underlying technology support team such as the mature and verified underlying technology platform provided by the infrastructure team. The most important thing is to build the system as soon as possible, and then iterate quickly with the product. At this time, it is necessary to reduce the learning cost and development cost of the existing R&D operation and maintenance team as much as possible, and do not need to consider the maintainability cost too much, because the system needs to be quickly and quickly piled up, and the most important thing is to verify product requirements and business models, and time is the most important. If you find a market opportunity, follow up quickly. After gaining a foothold, you can use a time-saving but resource-intensive method (commonly known as "stacking machines") to expand, or use the model of "changing engines while flying a plane" for rewriting. It is more cost-effective. For startups or projects, speed trumps everything.

(2) However, if you choose a computer software system or service built with open source software, which needs long-term maintenance, such as for mature business use in the company, or to upgrade the system to replace the original product due to the shortcomings of the mature platform in the company, then on the premise of meeting business needs, the maintainability of the system becomes the most important thing. Choose the corresponding open source software, whether it is mature and stable; whether the secondary development is friendly; whether the operation and maintenance cost is more cost-effective, that is, saves machine and bandwidth; In this case, the cost of developing a system may only account for less than 1/10 of the cost of the entire system life cycle. Therefore, on the premise of meeting the requirements, focus on maintainability.

1.2 Choose open source software according to technology development trends

As shown in the figure above, the research and development of modern computer software or services is a continuously running cycle and iterative process. Start with market analysis, then enter the creative stage, then go to the coding stage, and finally go to the online stage to complete the deployment and effectiveness of the application. After the online stage, the analysis will continue based on the data feedback obtained. In this iterative process, obviously, for an enterprise in an industry with fierce competition, the faster the iteration speed, the better. At the same time, it also needs to have the ability of rapid flexibility and low-cost expansion. That is, if the product direction is right, then quickly expand the system to undertake rapid growth in traffic and achieve rapid growth; For enterprises in the same industry, if enterprise A can iterate various products and strategies at a lower cost and faster speed, it is obvious that it can have a better competitive advantage than enterprise B, which has a slower iteration speed and higher cost.

The amount of open source software is very large now, and there are many open source projects under almost every category. For a specific need, how to choose? One suggestion is to choose based on technology trends. That is to say, the iterative way of the current computer system is Agile (agile) + Scale (expansion). Obviously, open source software that can support rapid iteration of computer systems and can easily perform low-cost elastic scaling is worth long-term investment. As for learning and using a new open source software, learners hope that the learning threshold of the software is as low as possible. For a popular open source software, the internal implementation can be as complicated as possible, but it must be user-friendly for users. Otherwise, even if the degree of innovation is good, but the ease of use is not good, only geeks can learn and master it, and the gap of innovation is difficult to bridge.

For example, after the emergence of Docker, it became popular all over the world at an extremely fast speed, and many engineers fell in love with Docker. Because of the characteristics of Docker, new features have been added to the traditional container system, including encapsulating the application program and the underlying dependency library into a container image. The container image has versions, and can be stored and distributed in large quantities through a centralized image warehouse. Docker first solves the long-term problem of standardization of development, testing, and online environments that has plagued engineers, and can support developers to perform rapid iterations. At the same time, a unified image warehouse is used for image distribution, and the underlying layer uses lightweight virtual machine or container technology, which can be pulled up very quickly, so the system using Docker can be easily elastically expanded. At the same time, because the application app is encapsulated in a mirror, it can logically perform better abstraction and reuse according to the design principles of the Domain Model. Obviously, such a technology is worth learning and mastering for every engineer who develops computer systems. Because he can bring great convenience. On the contrary, before the emergence of Docker, although the technology of Control Group (cgroup for short) + Namespace had already appeared and been integrated into the Linux kernel, and Google's borg-related papers had been published long ago, it is not easy for ordinary technical R&D teams to control containers and deploy container systems on a large scale within the company. In my impression, after the appearance of the borg paper, only BAT-level Internet companies in China have a small group of elite R&D teams to develop and use container management systems. For example, Baidu’s team is in charge of Matrix system R&D, Ali’s team is in charge of Punch system R&D, and Tencent also has a small team in charge of container system research. But except for that small group of teams, more engineers did not use containers in large numbers because of the relatively difficult learning difficulty. The technology of Docker conforms to the technical trend of agile and elastic expansion very well, and provides very good user-friendliness. Then it was quickly used by many engineers as soon as it came out, and became the default standard in the market.

These trendy open source software are worth choosing and investing in.

Another example is Spark. The emergence of Spark has solved the problem of relatively low performance caused by frequent IO operations in the distributed computing process of MapReduce. At the same time, it has greatly improved the ease of use, so it replaced the mainstream position of MapReduce in the field of distributed computing.

1.3 Choose according to the different stages of the open source software adoption cycle

As a product of intellectual activities, software has its life cycle, which is generally represented by the technology adoption curve of software.

Open source software is also a kind of software, and it also follows the law of software technology adoption. As shown below:

An open source software generally goes through 5 stages from its creation to its demise. From the innovation period (Innovators, accounting for 2.5%), to the early adoption period (Early Adopters, accounting for 13.5%), then crossing the chasm, entering the early majority period (Early Majority, accounting for 34%), then entering the late majority period (Late Majority, accounting for 34%), and finally entering the recession period (Laggards, accounting for 16%). The vast majority of open source innovation projects die without a successful cross-domain gap, that is, from the early adoption stage to the early public stage. Therefore, if you choose an open source project that needs to be used and maintained for a long time, it is more rational and scientific to choose a project that is in the early public or late public state.

Of course, if you just want to learn something new, you can look at open source projects that are in the innovator state, or look at projects that are in the "early adopter" state.

Be careful not to look at projects in recession (Laggards) no matter from the perspective of long-term R&D systems or from the perspective of personal learning. For example, at this stage, in 2022, there is no need to choose projects such as Mesos and Docker Swarm. Both projects have been in decline since Kubernetes became the default standard for categorizing container scheduling technologies, with their parent companies abandoning them. If you still invest more energy in development and maintenance at this stage, unless it is really a very strong request from Party A, you will choose to spend money in front of engineers and force you to use it.

Students may ask, where can I see these technology adoption curves?

InfoQ, Gartner, and Thoughtworks update their respective technology adoption curves and publish them every year. You can search the Internet to see what their respective technology adoption curves are, and then combine some industry experience to draw your own judgment.

For example  https://con.infoq.cn/conference/technology-selection?tab=bigdata

From here, you can see InfoQ's judgment on various popular technologies in the BigData field in 2022. 

As can be seen from the above figure, open source software such as Hudi, Clickhouse, and Delta Lake are still in the stage of innovators, that is, they are still relatively seldom adopted in the industry, and students who want to learn new projects can focus on them. However, these open source software are not suitable for application in mature application scenarios that require long-term maintenance.

Note that the technology adoption curves of these well-known technology media are updated every year. When making references, don't forget to pay attention to the time of publication.

1.4 Choose open source software according to the maturity of open source software

Another point is to choose open source based on the maturity of the open source software itself. That is, whether the open source software is released regularly, whether it is in a state of multi-party maintenance (even if a company’s strategy changes and no longer continues to maintain, there are other companies supporting it for a long time), whether the documentation is relatively complete, and other dimensions to evaluate maturity.

Regarding the maturity model of open source software, the open source community has many maturity models for measuring open source projects, among which the project maturity model of the Apache Open Source Software Foundation is relatively famous.

You can refer to here:  https://community.apache.org/apache-way/apache-project-maturity-model.html

According to the open source project maturity model developed by the Apache Open Source Software Foundation, he divides the evaluation latitude of an open source project into seven dimensions:

  • Code
  • License and Copyright (software license and copyright)
  • Release
  • Quality
  • Community
  • Consensus Building
  • Independence

Each latitude has several inspection items. For example, for Independence (independence), there are two inspection items, one is to see whether the project is independent of the influence of any company or organization, and the other is to see whether the contributors’ activities in the community represent themselves, or appear in the community and carry out activities as representatives of companies or organizations.

The Top Level projects of the Apache Foundation are top-level projects, which will be comprehensively judged from these dimensions during the graduation stage. Only projects that meet the standards in all aspects will be allowed to graduate from the incubation status of the Apache Foundation and become Top Level projects. This is also the reason why individuals prefer Apache's top-level projects.

In addition, the criticality score of the OpenSSF project (see  https://github.com/ossf/criticality_score ) is also a good reference indicator. It measures the number of community contributors, submission frequency, release frequency, and number of dependencies of a project to judge the importance of an open source software in the open source ecosystem. I won’t go into details here. Interested students can refer to its information. Personally, I think it is a direction worthy of reference, but this score is still in the early stage, and it is still far from the ideal state.

1.5 Select according to the quality indicators of the project

Obviously, the code quality of some open source software is better than that of other open source software. Sometimes it is necessary to choose open source software based on the quality of the project.

At this time, we need to look at some indicators that have been widely proven to be more effective in the industry.

Among them, MTTU is an indicator recommended by SonaType, a well-known open source supply chain software provider. It mentions MTTU in its famous supply chain annual report. See  https://www.sonatype.com/resources/state-of-the-software-supply-chain-2021

MTTU (Mean Time to Update): The average time for open source software to update the version of its dependent library. For example, an open source software A depends on an open source library B, assuming that the current version of A is 1.0, and the version that depends on B is 1.1. One day, the version of open source library B was upgraded from 1.1 to 1.2, and after a while, open source software A also released a new version 1.1, which upgraded the version of the dependency on B from 1.1 to 1.2. This time interval, that is, the time between the upgrade from open source version B to version 1.2 and the release time of the new version 1.1 of open source software A, is called Time to Update, which reflects the ability of the R&D team of open source software A to update its dependent versions synchronously according to the update cycle of the dependent library. Mean Time to Update refers to the average upgrade time of this software. The lower the value, the better the quality, indicating that the person in charge of the software is upgrading the versions of various dependent libraries very quickly, and is timely repairing the security vulnerabilities caused by various dependent libraries.

According to SonaType's statistics, the update and upgrade time MTTU of open source software in the industry is getting shorter and shorter. According to its statistics, the average MTTU of Java open source software on the Maven central warehouse was 371 days in 2011, 302 days in 2014, 158 days in 2018, and 28 days in 2021. It can be seen that with the acceleration of the update frequency of open source software libraries, the software using them has also accelerated the update version speed. Compared with 10 years ago, the MTTU time has been shortened to less than 10/1 of the original.

Of course MTTU is only an indirect dimension of project quality. Whether important high-risk security vulnerabilities have been exposed in history, whether the repair response is fast and timely, etc. are also important dimensions for quality evaluation of open source projects.

The security department of some major companies will constantly evaluate the security of open source software, and set some open source software that has frequently occurred high-risk security vulnerabilities but has not been repaired in time as unsafe software, and puts them in the internal open source software blacklist for public announcement, and requires each business R&D team not to use these software. In fact, it is necessary to migrate these old services to a relatively closed network environment because of R&D and manpower problems, so as to reduce the possible losses caused by risks. At this time, it is obviously necessary to abide by the company's security regulations and no longer use open source software on the blacklist.

1.6 Consider from the perspective of the open source community governance model to which open source software belongs.

There is another dimension, that is, considering the community governance model of this open source project, it is suitable for projects that require long-term development and maintenance.

Community governance model (Governance Model) mainly refers to how the project or community makes decisions and who makes decisions. Specifically: Can everyone contribute or just a few? Are decisions made by voting, or by authority? Are plans and discussions visible?

There are three common governance models for open source communities and open source projects:

  1. Dominance by a single company: It is characterized by the fact that the design, development, and release of software are controlled by a single company, and no external contributions are accepted. The development plan and version plan are not made public, and related discussions are not made public. The source code is only made public when the version is released. For example, Google's Android system.
  2. Dictator-dominated (there is a proper noun "Benevolent Dictatorship", translated as "benevolent dictatorship"): It is characterized by a person who controls the development of the project. He has strong influence and leadership, and is generally the founder of the project. For example, Linux Kernel is led by Linus Torvalds, and Python was previously led by Guido Van Rossum.
  3. Board-led: It is characterized by a group of people who form the project's board of directors to decide major issues of the project. For example, the project of the Apache Software Foundation is decided by the PMC of the project, and the decision-making of the CNCF foundation is the responsibility of the CNCF Board of Directors (many technical decisions are authorized to the technical supervision committee under the CNCF Board of Directors).

Personal opinion and experience, according to the governance of the open source community behind the open source software, the selection priorities are as follows:

  1. Priority is given to Apache graduate projects (because the intellectual property rights of these projects are clear, and at least three parties are maintaining them for a long time)
  2. The second best choice is the Linux Foundation and other key projects of open source foundations (because the Linux Foundation has strong operational capabilities, each key project is often supported by one or more large companies)
  3. Carefully choose a company-led open source project (because the company’s open source strategy may be adjusted at any time, and it is very likely that it will no longer continue to support the project. For example, Facebook is a company that has abandoned many pits)
  4. Try not to choose personal open source projects (personal open source is more casual, and the risk is particularly high, but some projects that are already well-known and have gone out of long-term maintenance mode cannot be ruled out, such as the Vue.js open source software in charge of well-known open source author Evan You).

This is the priority order recommended by individuals for selecting similar open source software projects, and it only represents personal views. Discussion is welcome.

2. How to customize and maintain

After an open source software is introduced into the enterprise and used for development and long-term maintenance, the problem of how to customize and maintain it arises. First of all, it must be clear that after open source software is introduced into the enterprise, it needs to be customized. For several reasons:

  1. Open source software is often applicable to general scenarios, and there are many situations to consider, and it needs to support various usage scenarios. However, after being introduced into the enterprise, it often only needs to be specific to enterprise scenarios. Therefore, optimizing for these specific scenarios, such as tailoring all functions, removing features irrelevant to this scenario, and performing performance tuning and parameter optimization for specific scenarios, can often achieve better performance, such as being able to withstand more traffic and save machine costs. The effect is amazing. This is also a common customization method.
  2. When open source software enters the enterprise, it needs to be developed and operated for a long time, and it needs to meet the various internal service operation and maintenance specifications of the enterprise. For example, when a business goes online, it needs to have complete logs and monitoring, for example, it needs to provide a service health check interface, and it also needs to have fault-tolerant processing such as traffic scheduling. These all require custom modifications.
  3. Open source software also needs to be connected to the upstream and downstream systems within the enterprise. For example, if the correct operation of the software needs to rely on the underlying distributed storage and distributed computing system to complete the basic functions, it needs to be connected to the existing storage system or computing system within the enterprise; the underlying virtual machine system or container scheduling system inside the enterprise often has some modifications and optimizations, and the connection also needs to be modified; therefore, customized modifications are required at this time.
  4. Customization of requirements in special scenarios. Using this open source software in enterprise application scenarios often encounters specific problems and may encounter bugs, which require bugfixes and new features to support.

2.1 How to customize and modify open source software?

In this regard, the author suggests several basic principles: Do not move the core code of the open source software, try to use the existing plug-in mechanism of the open source software; or modify the peripheral; regularly upgrade to the stable version of the open source community.

At the beginning of the design of many open source software, many extension mechanisms are left to facilitate subsequent developers to expand functions and increase features. For example, some of the most famous open source software Visual Studio Code and Firefox Browser provide the Extension mechanism. Many developers develop corresponding plug-ins according to their own needs, and submit the plug-ins to the officially supported plug-in market. Ordinary users can also browse the plug-in market after installing the main program, find and select the plug-ins they need to install. In addition, like Kubernetes, it also provides extension mechanisms in many places. For example, the core scheduler provides a customized scheduler for the development of personalized scheduling strategies; the underlying storage and network provide many plug-in mechanisms; the most commendable is that it provides CRD (Custom Resource Definition) mechanism, which allows developers to define new resource types and reuse Kubernetes' mature declarative API and scheduling mechanism for convenient operation and maintenance. Therefore, try to use the existing plug-ins or extension mechanisms of the open source project to add features.

For the modification and customization of some open source software, it is not suitable to use its extension mechanism, or it itself does not provide an available extension mechanism. For the modification at this time, try to modify the periphery of the source code core instead of touching its core code. Because open source software is continuously iterated with the progress of the open source community, the development of the open source community will continue to bring more and better features. If the core code is modified, it will be very painful when it needs to be upgraded to a newer open source version. Because there are a large number of internal patches that need to be merged and various tests are required, the upgrade cost will be too high to be synchronized with the main version of the community. In the end, due to the resignation or transfer of some core engineers, no one can continue to maintain that part of the modification, resulting in the failure of maintenance and upgrade of the entire system. Finally, the entire system will be abandoned or reinvented, which will lead to a lot of labor costs. The author has worked in many large Internet companies for many years, and I have seen too many such projects. Too many modifications originally aimed at open source projects are very necessary, but because the core code has been changed, the cost of upgrading to a newer version in the open source community is too high. In the end, no one can maintain the system.

For example, the author saw two technical teams maintaining Redis clusters in a large factory, and the versions used at that time were both Redis 2.x versions. Because there are not many cluster functions and it is not good for large-scale business support, both teams have modified the 2.X version of Redis. Among them, team A’s modification method is to modify at the periphery, that is, a layer is encapsulated on top of Redis, which is used for traffic scheduling, failover processing and other functions; team B is more ruthless, directly modifying the core code of Redis, directly adding the code related to the cluster function, and even in some local test scenarios, the performance is better. In a short period of time, both teams were able to meet the needs of the line of business. However, the Redis open source community is constantly iterating and adding more and better requirements. When Redis is released to 3.x, both teams want to upgrade to a newer version, because the business side that uses Redis also wants to use the 3.x version. But the upgrade cost is obviously different. Team A quickly transplanted the relevant functions to 3.x and upgraded the Redis version. Team B, because the core changes are too large, the cost of transplantation and testing is too high, so it is too late to upgrade the 2.x version of the service. After the community version 4.x came out and the core engineer of team B left, no one in the Redis cluster could continue to maintain and meet the customer's new version requirements, so they had to reinvent the wheel and build the cluster directly from the community version 4.X. It took a long time to migrate their own systems, which also brought a lot of costs to customers.

Therefore, it is recommended that the modification of the source code of open source software exist in the form of Local Patch (local patch), which is convenient for maintenance and upgrade, and convenient for management and statistics. In this mode, the compilation script of the internal project generally unpacks a certain source code package of the open source software, and then uses the patch command to type in these Local Patches one by one, and then compile and test together. Instead of directly typing the patch into the business source code, although it saves a few minutes in the CI stage, the subsequent maintenance, upgrade, and management add considerable trouble.

2.2 Give back to the community, Upstream (feedback) to the upstream open source community, reducing maintenance costs

After an engineer adds a feature or bugfix to a certain version of an open source software within the enterprise, it will generally exist in the code base in the form of a Local Patch (local patch). The author recommends that after solving business problems, engineers try to submit these local patches to the upstream open source community to which the open source software belongs to complete the upstream process.

Upstream has the following advantages:

  • better code

Adding features to an open source software, especially bugfix patches within an enterprise, is often done in a "hack" method because of time constraints. That is, in order to quickly solve problems online, the places to fix the patch may not be very reasonable, the logic of the code patch may have loopholes, and the code patch may not be perfect in handling more abnormal conditions, etc. At this time, if the Local Patch is brought back to the open source community to which the open source project belongs, and after in-depth communication with the senior engineer (Module Reviewer/module leader) of the open source community, the code patch will be better improved based on their feedback, so as to get better code.

  • Can reduce maintenance cost

Internally reserved Local Patches, each time you upgrade to a newer version of open source software, these patches need to be evaluated, and some of them need to be merged and tested. Of course, I hope that the number of these Local Patch will be less and less. The best way is to include these patches when the open source community releases a new version. The more the number included, the fewer the number of local patches that need to be evaluated, merged and tested within the enterprise, and the lower the cost of upgrading. I remember that in the release version of Fedora, each version retains a lot of local patches for the kernel and other components. Red Hat engineers are also constantly contributing and incorporating these local patches into the upstream open source project community, so as to keep the number of local patches inside Fedora at a relatively low level, and also ensure that the cost of upgrading versions is relatively controllable.

  • Establish team technical brand and employer brand, facilitate recruitment, and enhance the pride of engineers,

Contributing code to the upstream open source technology community, Upstream these local patches, can get a better community reputation. Show these technical communities that the company is not just a consumer of open source software, but a contributor as well.

At the same time, a strong team technology brand can be established, which shows that the company not only has a good business, but also has a strong technical team, which is convenient for external recruitment.

Upstream to the upstream open source community is also conducive to improving the pride and satisfaction of the team's engineers.

For example, when Xiaomi Corporation is using the Apache HBase project extensively, the responsible R&D engineer resolutely implements the upstream strategy, continuously contributes the patches verified internally by Xiaomi back to the HBase community, and discusses and develops certain features with the students in the Hbase community. The influence of Xiaomi students in the HBase community is growing, and Committers and PMCs are constantly produced. Finally, Xiaomi engineer Zhang Duo became the PMC leader of the project, that is, the PMC Chair of the project. Xiaomi's technology brand in the fields of big data and cloud computing is largely derived from the R&D team related to this project.

3. How to use open source for personal growth

The growth of an engineer is closely related to his daily work and his daily study. In this process, here are some suggestions on how to use open source software to better help engineers grow and help engineers realize their career or technical ideals.

3.1 Openness and sharing, vision and mentality

Only by standing on the shoulders of giants can we stand taller. There are all kinds of software in the open source world, oriented to various scenarios and solving various problems. So be sure to keep an open mind, that is, before doing technology-related things, first see how others do it. You must know that the world is so big, and more than 99.99% of the problems encountered by engineers are problems that others have encountered. How did others solve it? What experience can we learn from? In particular, you can look at other people’s open source projects, look at their design documents, and see how they think; look at their source code, and see how they implement it. If you are interested, you can further communicate directly with them. On the one hand, you can avoid a lot of detours, avoid a lot of unnecessary repetitive work, and avoid repeated pitfalls. Second, there is no need to reinvent the wheel, and the limited time can be devoted to more valuable work. Don't sit on the sky and watch the sky. I am the number one in the world. Look more at the industry and open source circles. Students who have just graduated and went to big factories need to pay special attention.

In addition, a sharing mentality is also needed. It is best to share what has been learned, so that others can also refer to it, learn from experience and lessons, so as to achieve the goal of common improvement.

3.2 Recommended steps and methods for learning open source software — Feynman learning method

There are various learning methods for learning open source software. For different learning purposes, you also need to adopt different learning methods that are more suitable for you according to your own situation (that is, the degree of familiarity with the field and the degree of understanding of related open source projects).

The author here recommends a method suitable for engineers to learn a new open source technology project:

  1. Get started as soon as possible, run the Quick Start (Quick Start) and Tutorial (Introduction Tutorial) of this open source software, and first understand its main scenarios and key features.
  2. Then read the document, pay attention to the main architecture diagram of the system, understand the general architecture of the entire system, and establish a relatively large overall framework diagram.
  3. Finally, look at the relevant details in combination with your own actual application scenarios, including documents and codes for certain details.

For example, if you want to learn Kubernetes, first go to its official website, and quickly run through the tutorials provided on its official website (https://kubernetes.io/docs/tutorials/kubernetes-basics/create-cluster/cluster-interactive/), to learn how to create pods, how to access, how to update, how to perform traffic scheduling, and so on. Then look at its architecture diagram to understand its design principle, which is declarative programming, including the functions of several core components Kube-apiserver, kube-scheduler, kube-controller-manager, kubelet, etc. and how these components interact; finally, according to the needs of your own business scenarios, see which specific parts need a deeper understanding. For example, if you need to add your own storage method, then look at the relevant code and refer to the implementation of storage methods of other suppliers.

It is not recommended to read the source code first, as it is clueless and inefficient. Moreover, many open source projects are too big now, and the iteration speed is very fast. It is difficult for someone to understand all the codes, and it is impossible to do it from the perspective of personal energy, let alone unnecessary.

Note that learning must be combined with application, that is, hands-on. " What is done on paper is always superficial, but I know that this matter must be done in practice." The ancients said, honesty is not to be deceived, especially for engineers. If you want to have a deeper understanding of a new technology, or even plan to switch the technical route and professional track, then you must do more. To use this open source software, or write a program to run a demo and run it in the experimental environment, it is best to solve some practical problems around you. Don't think that everything is simple, but if you really want to run it, it will be extremely difficult to use. You can try to participate in some innovative activities in technology companies, such as hackthlon (hackathon) activities, to use newly learned technologies; or write a little tool, let him run and solve a little practical problem. For example, if you want to practice Python, write a crawler to crawl the data on the weather forecast website every day, and then do a simple query to get the current weather forecast. Learn by using, apply what you have learned.

Another very useful learning method is the Feynman learning method. The Feynman method of learning is considered to be one of the most effective and powerful learning methods, and it works in a hands-on test. The steps are also very simple, I simplified it into the following three steps.

  1. Learn a technology first
  2. Tell it to the common man, let him understand
  3. If the audience does not understand, go back to step 1

Through this method, only by explaining the usage and structure of the technology by yourself, and allowing ordinary engineers to understand the technology, can you really master it.

The Feynman learning method originated from Richard Feynman, the Nobel Prize winner in physics. He is a well-known theoretical physicist, one of the founders of quantum electrodynamics, and the father of nanotechnology. He was awarded the Nobel Prize in Physics in 1965 for his contribution to quantum electrodynamics. The learning method he advocated is called "Feynman learning method". Although the steps are very simple, it is necessary to simplify the complex technology and explain it in a way that ordinary engineers can understand. This requires a deep understanding and mastery of this technology, and it also requires analogy and association of some proper nouns and concepts to simplify. Generally, if you can do this, it means that you have reached the level of getting started with this technology, and you can continue to follow up with more in-depth study.

In addition, it is also a better way to participate in the examination or certification of some famous courses in the industry. For example, an engineer who is not familiar with cloud native, after he passes the CKA (Certified Kubernetes Administrator) certification Kubernetes administrator exam, this certification can verify that he has a certain level and has established a comprehensive understanding of common Kubernetes operations and system architecture.

3.3 Integrate into the open source community, lifelong personal word of mouth

Finally, for engineers, participating and actively contributing to the open source community will gain a lifelong reputation and make lifelong friends, which is very conducive to the long-term development of engineers. Here, engineers are encouraged to choose the open source projects and communities they are interested in, and continue to grow in the community through exchanges and contributions. Even if he will no longer be active in this open source project and community due to work or other reasons, his contribution will always be recognized. The Apache Open Source Software Foundation has a well-known motto: "Merit never expires" (see  http://theapacheway.com/merit-never-expires/ ), which means that the recognition gained by engineers for contributing to Apache Open Source Software Foundation projects and communities will never become obsolete. Once a committer, always a committer.

Collaborating in the open source community is also a way for engineers to socialize. Here, being able to meet lifelong friends, working and communicating with them is also very effective for the growth of engineers. Many big stars in the open source community are also very friendly in the community, especially when they treat newcomers and engineers who are relatively junior but have a strong desire to contribute, and they are more willing to teach them hand in hand. With the help and guidance of these great engineers, the growth of newcomers is very fast, and there is no ceiling brought by enterprises/departments/work projects, etc. That is to say, newcomers can continue to communicate and learn with senior engineers in the open source projects and communities they are interested in, with an open minded attitude and desire to contribute, which can bring rapid development of technical capabilities.

In addition, for current engineers, it is difficult to have a company with lifetime employment. An engineer works in a company for a period of time, and then with various active or passive changes, the position or the company where he works will also change. However, the recognition obtained by contributing in the open source community, and the personal brand and technical reputation established will always follow the individual and will not change due to the situation of the company or enterprise. I can see many people who have been active in the open source community. Although their careers have changed a lot, their recognition and brand in the open source community have always existed. This is also a good way for many engineers to break through professional introversion and platform restrictions.

Contributing to the open source community for a long time is a good thing to benefit others and yourself. We encourage every engineer with ideas and actions to find open source projects and communities that he likes and invest in, and integrate into them.

3.4 How to contribute to the open source community

In open source communities, especially those that respect meritocracy (such as Apache Foundation projects), the more contributions you make, the more recognition you get. But in many cases, as a newcomer, going to contribute to the open source community is not something you can just raise your hand. You need to understand some community rules first, and then abide by the rules before you can slowly integrate.

1. What to contribute?

Before making contributions, we need to understand that contributions to the open source community are not limited to code contributions. Writing code to add features or bugfixes is a contribution, improving documentation and test cases is a contribution, reporting usage problems is a contribution, and writing blogs to introduce projects and recommend projects are also contributions. These are contributions that are widely recognized in the open source community.

Many technical experts in the community start to contribute to the open source community by submitting test reports. For example, the youngest architect in the Mozilla community, Blake Ross (who became one of the highest technical decision-makers in the Mozilla community at the age of 17 and founded the Firefox project with another architect), first entered the Mozilla community as an intern, starting from testing.

"Scratch your own itch!" This is a very popular phrase in the open source community, which means that making contributions in the open source community requires solving your own problems. That is, encounter problems in actual work, then try to solve them, and finally contribute the solved results to the community in a way accepted by the community. The general situation is that there is a bug or problem that affects the user's actual application, or wants to add a new function to meet the company's own scenario, or just wants to learn some new technologies. This kind of contribution to solving one's own needs is relatively long-term. For some small profits, participating in some activities operated by the community to get rewards is just for fun for engineers, and this kind of contribution is not long-term.

Therefore, for a newcomer, entering the open source community, contribution can start from some simple problems, starting from solving one's own needs. For the simplest example, first read the introductory document, follow the steps described in the document step by step, and see if it works; if it doesn’t work, you can report a bug; or if you experience that you need to add some extra steps to get through, you can provide a Patch to the introductory document and describe these supplementary steps. This is also a welcome contribution from the community.

Some communities set some simple bugs as "Good First Issues". Contributors can choose these issues to make contributions to familiarize themselves with the contribution process and integrate into the community.

2. Understand the existing community situation and respect the practices and habits of the community

The first step in contributing to an open source community is getting to know the community.

You can learn about some basic information about the open source community through the community's website, mailing list, Wiki, documents in the github code warehouse, and other materials.

Learn about the contribution process and recommended methods for these projects by viewing the key documentation (Contributing.md).

Note that each open source community has its own conventions. For example, they have their own issue management system (some may use github's Issue, some use Bugzilla, and some use Jira), and the process and requirements for submitting Patch are also different.

For example, the Apache HTTP Server project with a very long history has the following requirements for contributors:

  1. Patches need to conform to their Code Style
  2. There are also some requirements for code quality such as thread safety
  3. Patch needs to be compared against the current development version -2.5.X
  4. The format of the patch is generated using diff -u file-old.c file.c
  5. The entry to submit a patch is at bz.apache.org/bugzilla, it is recommended to add the keyword "PatchAvailable"
  6. You can send an email in the mail list to discuss, and the title of the email needs to be [PATCH ]

Note that the way they adopt is not the popular Fork/Pull Request mode on github, but the older Bugzilla+Diff Patch mode. Please respect their work habits and use the mode they require. (To be honest, when the author made contributions to the Mozilla community 20 years ago, he also used the Bugzilla + Diff Patch method of work. More than 20 years have passed, and the working model of Apache's HTTP Server project has not changed much. However, the working method does not affect the contribution, just be familiar with it and get used to it.)

Some open source communities provide a gamified contribution process, which allows developers to familiarize themselves with the project and the contribution process through a series of simple novice tasks. This method is more friendly to newcomers, and it is also carefully designed by the community manager of the community. So for contributors, don’t let their good intentions go, go through the tasks you think are necessary, and get familiar with the tasks and processes you want to be familiar with.

3. The attitude needs to be "Be Polite and Respectful" and respect the diversity of the community

The open source community is full of diversity.

Most of the senior engineers in the open source community are very friendly to newcomers. They will patiently teach newcomers, familiarize themselves with documents, familiarize themselves with the contribution process, etc. In daily communication, including on mailing lists, in IRC or Slack channels, and in Issue comments, it is quite nice. It is easier to communicate and collaborate with them.

But pay attention, there are also some people whose relative attitudes are not particularly good. If you encounter them, be careful not to have a direct conflict. It is suggested that some more senior engineers in the community can be asked for help instead of being tough. It's impossible to change anyone, and it's impossible to please everyone, just do the necessary work.

4. How to quickly find the Module Owner responsible for code review and complete the contribution

Sometimes, following the documentation of the community contribution process, whether it is raising issues or reporting bugs, and finding that the module leader is slow to give feedback, there are some tricks at this time.

You can join their IRC or Slack channel, find the corresponding module owner, and have a polite and constructive dialogue with the module owner.

Build a good relationship with them and gradually build their trust through practical contributions.

Note that open source communities operate on trust. Being able to gain the trust of the person in charge of the module is very conducive to the development of future work.

5. Steps need to be paid attention to when submitting a large patch

Some engineers may give feedback that I submitted a very good feature to the XX open source community, tested and verified it in my company's internal working environment, the effect is very good, and the performance is very good. But when I submitted the code to the upstream open source community, I found that the community didn't value this feature. Instead, they pointed at my Patch and picked out various problems. It's too troublesome, too exhausting, so I don't contribute at all.

Imagine that if a stranger submits a large Patch to your project, it will be very difficult to implement code review because the Patch is relatively large. Although the contributor said that this patch is very useful, implements a powerful function, and has been verified by him, whether it is reliable, whether it can exist in the community for a long time, whether it can fix the problems caused by the code it submitted in time, these are all question marks. Therefore, before basic trust is established (that is, several small patches are submitted and accepted), it is very strenuous to submit a large patch.

In addition, the engineer who submits this patch often does not know the history of this open source community. Maybe this function has been discussed in the community a long time ago, and maybe the conclusion of the discussion is that it does not need to be done or it can be done elsewhere. Therefore, don't be blindly confident in your own patch, but communicate the scenario and problem with the engineers in the community first.

The author proposes to contribute as follows:

  1. If it is judged that the Patch is relatively large, then discuss the problem in the community first, let the community recognize the problem, and at the same time obtain some historical information (if any) from the community on the problem
  2. If the community recognizes the problem and thinks it should be fixed now, continue to discuss solutions
  3. After the problems and ideas have been approved and a little bit of design is completed, discuss the specific code Patch
  4. Patch needs to comply with community specifications (CodeStyle, component call specification, test specification, document specification, etc.)
  5. Be mentally prepared, the patch may need to be revised several times before it can be finally merged in. It may be necessary to split a large patch into several small patches and submit and import them in batches. Some compromises are required when necessary.

Contributing a large Patch to implement an important function requires many steps and a long time period, but after completion, being highly recognized by the community is often the basis for becoming a higher-level contributor. And for individual contributors, the inner satisfaction and sense of accomplishment are also very sufficient.

6. Be careful not to do the following things

  1. Propose an idea and hope that others will complete it.

Especially when you just joined a community, you suggested that the community needs to do certain things, but if you don’t do it yourself, you hope that other people in the community will do it. These opinions are often ignored. "There are many people who, when they 'get off the bus', make comments and comments, criticize this and criticize that. In fact, ten out of ten such people will fail." Such people are not welcomed by the community.

Raise a problem, provide a constructive solution at the same time, and participate in it yourself, you can invite others in the community to come together. This is the recommended approach.

2. Too eager and impatient, ignoring community practices.

It's better to go slower, especially until the community's sense of trust in newcomers is established, and be patient. The author once met an engineer who had just entered the open source community. He had strong technical skills, but he just wanted to get his Patch and enter it quickly. When communicating with the person in charge of the module, although the attitude is polite, the response to the improvement suggestions given by the person in charge is very perfunctory. After several tosses, the contributor’s reputation in the community has been lost, and his related bugfixes and new feature development progressed very slowly, and he later left the project sadly.

3. Don't touch the red line (that is, some bad behaviors prohibited by the community's code of conduct)

Basically every mature open source community has its own code of conduct (Code of Conduct), which is usually displayed in a prominent position on the community website or code repository.

The normative content lists several actions that are not welcome in the community, including discrimination and offense in terms of gender, race, religion, etc.

Be careful not to have these behaviors. Some behaviors may not be considered a big problem in the Chinese open source community, but they are not necessarily small things in the international community.

7. Pay attention to compliance issues when making contributions to the upstream community within the enterprise

Contributing to the upstream community within the enterprise is to disclose the results of the company's internal research and development, so it needs to meet the company's internal open source contribution management methods.

Each company's regulations on this are not consistent. For example, Google encourages engineers to contribute to the open source community, but requires engineers to make contributions with google.com email addresses. Contributions of less than 100 lines do not need to be approved by the internal process, but only if the project does not use licenses prohibited by Google (such as AGPL, Public Domain, CC-BY-NC-*). In addition, there are some hard conditions. See the official website link of Google OSPO  https://opensource.google/documentation/reference/patching . Domestic Baidu company also encourages engineers to contribute to the open source community. Regardless of the size of the patch, it needs to go through an internal electronic process, be approved by the technical director of the department, and submit it to Baidu's Open Source Management Office (OSPO) for filing, so as to provide data support for subsequent data statistics of the open source office and incentives for engineers' contributions.

When making contributions to the upstream community within the enterprise, it is often encountered that the community requires engineers to sign a CLA (Contribution License Agreement, that is, a contribution license agreement) or a DCO (Developer Certificate of Origin, a developer’s original statement). Among them, CLA is divided into ICLA (Individual Contributor License Agreement) and CCLA (Coperation Contribution License Agreement, that is, enterprise-level contribution license agreement). Among them, ICLA is for individuals, and CCLA is for the entire enterprise. That is, if the enterprise signs CCLA, the internal engineers of the enterprise do not need to sign ICLA separately if they make contributions. If you do not sign the CLA, you cannot submit a patch. The content of the CLA terms is that contributors authorize their contributions to the community for use. At this time, please abide by the company's internal regulations. The relevant CLA clauses may need to be reviewed by the company's internal legal affairs. Fortunately, the CLA clauses of some well-known projects, such as the projects of the Apache Open Source Software Foundation, use a unified CLA file, and the projects of the CNCF Foundation are similar. The CLA clauses of these famous projects, after legal confirmation, there will be no problem. If it is not a CLA that has been confirmed by the legal affairs, it is necessary to consult with the legal affairs responsible for the company to avoid encountering some CLAs that are not good for the company.

Summarize

This article is relatively long, and it condenses a lot of my thoughts and experiences.

I have always believed that engineers are a group of people who are very pragmatic and hardworking. They are a group of people who deeply believe that "we can use code to change the world". I have always believed that "openness, collaboration, and pragmatism" is one of the best characteristics of contemporary engineers.

Learning, working, and sharing in the open source world is one of the best ways for engineers to change the world.

Guess you like

Origin blog.csdn.net/2301_77700816/article/details/131782500