How engineers treat open source

How engineers treat open source

This article is about the author as an engineer who has been engaged in open source related work in a well-known technology company for more than 20 years. I have personally experienced or witnessed the excellent practices of many engineers in dealing with open source software, and I have also seen many Bad Cases, so I want to write some of my own experiences. Here, for the reference of engineers, I hope to help engineers grow better.

Overview

As an engineer who performs technical work within a technology company, the task is to use technical means to support and achieve the business goals that the company is concerned about. In the actual work process, it is necessary to actively or passively use and maintain a large number of open source software. According to statistics, each engineer will come into contact with thousands of open source software every year when conducting R&D and operation and maintenance work within the enterprise. More, at the 10,000 level or even the 100,000 level. (Data source: "2020 State of the Software Supply Chain" published by Sonatype)

So how to choose open source software? Among so many open source software, how to choose a suitable open source project for investment according to personal and business needs requires comprehensive consideration.

How to customize and maintain long-term after choosing open source software? This is also a big problem. Because developing software within an enterprise is different from developing software by individuals, the cost of maintaining a computer software system is far greater than the cost of developing the system or software. After choosing open source software, how to customize and modify it from a long-term perspective, and how to carry out the follow-up long-term maintenance, can achieve high efficiency and save costs. There are many good experiences in the industry, and there are also many less successful cases as lessons.

Finally, back to the individual, the growth of an engineer is carried out through continuous learning and practice. How to use open source to improve your ability, expand your horizons, and improve your technical reputation and industry influence is also very important for engineers themselves.

This article will be elaborated in the following three parts:

  1. How engineers choose open source software
  2. How engineers customize and maintain open source software
  3. How Engineers Can Use Open Source for Personal Growth

1. How to choose open source software

First of all, it is necessary to clarify the attitude towards open source software. It is impossible to leave the use of open source software at this stage. There are various risks associated with using open source software, including open source compliance, security, and efficiency issues. Simplified to one sentence: The use of open source software within an enterprise requires compliance with the enterprise's internal regulations on open source software, including how to introduce and maintain it, so as to achieve efficient, safe, and compliant use.

Returning to the question of how to choose a specific open source software, there are the following latitudes for reference.

  1. According to demand
  2. According to technology trends
  3. According to the different stages of the software adoption cycle
  4. According to the maturity of open source software
  5. According to the quality indicators of the project
  6. According to the governance model of the project

1.1 Choose open source software according to your needs

When choosing open source software, we must first clarify the requirements, that is, what is the purpose of choosing this open source software. When an engineer chooses an open source software, what is it used for, whether it is used for personal learning; or to meet the needs of ToB customers; or to meet the needs of internal service development. Under these three different purposes, the orientation of choosing open source software is completely different. (Note: The latter two scenarios need to consider the requirements of enterprise open source compliance first, see Chapter 3)

Let's talk about choosing open source software for personal learning, then we need to see what the specific purpose of personal learning is. Do you want to learn a relatively popular technology to improve your technical knowledge structure and expand your technical vision? Or do you want to see the specific implementation of the corresponding open source technology projects as a reference for internal project technology development? Or do you want to do it for the next job Make targeted technical preparations. Different purposes lead to different choices. For the former, it is obvious that what technology is the most popular and what to choose; for the second purpose, it is generally a targeted selection of well-known open source software or innovative software in the field of technology, that is, a certain feature is What I need currently, or something my current project implements poorly, I need to see how others implement it. The last one is obviously to prepare according to the position needs and technology stack requirements of the next job, and choose according to the threshold level of the technology stack requirements. However, note that choosing open source software based on personal needs generally requires writing a small project to practice your skills, such as a Demo program or a testing service, because you do not need to consider subsequent long-term maintenance, so you can do it according to your personal ideas and personal research and development habits. For various exercises, you do not need to follow the internal development process and quality requirements of the enterprise, nor do you need to consider the stability of the open source software and the maturity of the community. You only need to study and refer to the code as much as you like.

Then look at the next requirement. The software that chooses open source software for research and development needs to be provided to customers, and it may often be delivered in the form of a private cloud. To choose open source software based on such needs, pay attention to a balance, that is, the needs of customers and the needs of the company's own technical planning or long-term product planning. Entering the customer's IDC environment in the form of a private cloud needs to be integrated with the upstream and downstream projects of the customer's development and operation environment. At this time, depending on the needs of customers, some customers may have specific requirements for open source software, such as requiring the use of HDFS and a specific version. The requirements for this type of specified software name and specified version may be because the customer is currently familiar with this version, or it may be because of the software and versions provided by other software and hardware suppliers before, and the specified purpose is to facilitate integration and subsequent use. maintain. If this demand is in line with the long-term development needs of enterprise projects or products, it can be completely satisfied. If Party A is very strong and has no other way but to meet his requirements, then choose the software and version specified by the customer. However, if it is inconsistent with the long-term development needs of its own project or product, and the specific project or version can be negotiated with Party A, then it is necessary to negotiate with the customer to come out with a result acceptable to both parties, that is, to select a specific open source software and version. It is necessary to achieve customer satisfaction and pay the bill, and to ensure that its own delivery costs are controllable, and also to meet the long-term development needs of its own projects or products. For example, a customer uses an older version of Java, but the software delivered by the enterprise's toB requires a higher version of Java. Then you need to negotiate with the customer, either switch to the version desired by the enterprise, or help the customer to complete the upgrade of the existing system; or you can only reduce the Java version requirements of your own software, you may also need to modify some of your own code, and also Modifications may be made to certain dependent components in the software. In this scenario, there are many choices under objective constraints, which need to be negotiated with customers, their own product managers and architects.

Finally, if the scenario is to meet the needs of internal services, that is, the services built with open source software are for internal business or end users, which are common in the Internet service systems of major domestic Internet companies and apps on various mobile phones. At this time, the developer and maintainer of the project have greater autonomy, which is completely different from the delivery business of toB. At this time, when choosing open source software, it is necessary to comprehensively consider the development and maintenance costs, as well as the stage of the business that uses the service.

(1) If the service provided is for innovative business, innovative business is generally a trial and error business, and needs to be adjusted at any time according to changes in market conditions and the current state of implementation. was cancelled. In this case, the "rough and fast" development method is more appropriate. Without thinking too much about the maintainability and scalability of the system, just use the software technology stack that the R&D team is most familiar with, and then use the underlying technology to support the team such as the foundation The mature and proven underlying basic technology platform provided by the architecture team is sufficient. The most important thing is to build the system as soon as possible, and then iterate quickly with the product. At this time, it is necessary to reduce the learning cost and development cost of the existing R&D operation and maintenance team as much as possible, and do not need to think too much about the maintenance cost, because the system needs to be piled up quickly and violently. Verifying product requirements and business models is the most important, and the time is the most. important. If you find a market opportunity, follow up quickly. After gaining a firm foothold, you can use a time-saving but resource-intensive method (commonly known as "stacking machines") to expand, or rewrite the model by "changing engines while flying" All are relatively cheap. For a business or project at the start-up stage, speed trumps everything.

(2) However, if you choose a computer software system or service built with open source software, it requires long-term maintenance, such as being used for mature businesses in the company, or system upgrades for the shortcomings of mature platforms in the company and replacing the original ones. product, then under the premise of meeting business needs, considering the maintainability of the system becomes the most important thing. Select the corresponding open source software, whether it is mature and stable; whether secondary development is friendly; whether the operation and maintenance cost is more cost-effective, that is, saving machines and bandwidth; Automatic, lossless completion; whether it is easy to Upstream to the upstream open source community, etc., these have become important considerations. In this case, the cost of developing a system may be less than 1/10 of the cost of the entire system life cycle. Therefore, under the premise of meeting the requirements, the focus is on maintainability.

1.2 Choose open source software according to technology development trends

As shown in the figure above, the development of modern computer software or services is a continuous cycle and iterative process. Start with market analysis, then enter the creative stage, then to the coding stage, and finally to the launch stage to complete the deployment and validation of the application, and continue the analysis based on the data feedback obtained after the launch. This cyclic iteration process, obviously for an enterprise in a highly competitive industry, the faster the iteration speed, the better, and at the same time, it also needs to have the ability of rapid elasticity and low-cost expansion, that is, the product direction is right, then hurry up System expansion, to undertake the rapid growth of traffic, to achieve rapid growth; if the product direction is wrong, it is necessary to quickly shrink the capacity, save related hardware and human resources, and invest in new trial and error directions. Enterprises in the same industry, if enterprise A can iterate various products and strategies at a lower cost and at a faster speed, obviously it can be slower than the iteration speed and high cost of enterprise B. good competitive advantage.

The amount of open source software is very large now, and there are many open source projects under almost every category. How to choose for a specific need? One suggestion is to choose based on technology trends. That is, the current way of computer system iteration is Agile (agile) + Scale (expansion). Obviously, open source software that can support rapid iteration of computer systems and can easily perform low-cost elastic scaling is worthy of long-term investment. For the learning and use of a new open source software, learners hope that the lower the learning threshold of the software, the better. A popular open source software, the internal implementation can be as complex as possible, but it must be user-friendly for users. Otherwise, even if the degree of innovation is good and the ease of use is not good, only geeks can learn and master it, and the gap of innovation will be difficult to bridge.

For example, after the emergence of Docker, it has swept the world at an extremely fast speed, and many engineers have fallen in love with Docker. It is because of the characteristics of Docker that new features are added to the traditional container system, including encapsulating the application and the underlying dependency library into a container image. The container image has versions, and can be stored and distributed in large quantities through a centralized image repository. . Docker first solves the problem of standardization of development, testing, and online environments that has plagued engineers for a long time, and can support developers in rapid iteration. At the same time, a unified image repository is used for image distribution, and the bottom layer uses a lightweight virtual machine or container technology, which can be pulled up very quickly, so the system using Docker can easily expand elastically. At the same time, because the application is encapsulated in an image, it can be logically abstracted and reused better according to the design principles of the Domain Model. Obviously, such a technology is worth learning and mastering for every engineer who develops computer systems. Because he can bring great convenience. On the contrary, before the emergence of Docker, although the technology of Control Group (cgroup) + Namespace has long appeared and has been integrated into the Linux kernel, Google's borg related papers have long been published, but the general technical research and development team is not It is easy to harness containers and deploy container systems at scale within a company. In my impression, after the borg paper appeared, only BAT-level Internet companies in China have a small group of elite R&D teams to develop and use container management systems. For example, Baidu's team is responsible for the Matrix system research and development, Ali is responsible for the Punch system research and development team, and Tencent also has a team. Small teams are responsible for research on container systems. But apart from that small group of teams, more engineers did not use containers in large numbers because of the relatively difficult learning difficulty. Docker is a technology that conforms to the technical trend of agile and elastic expansion very well, and provides very good ease of use for users. Then it was quickly used by many engineers as soon as it came out, and it became the default of the market. standard.

These trendy open source software are worth choosing and investing in.

Another example is Spark. The emergence of Spark solves the problem of low performance of MapReduce due to the need for frequent IO operations in the distributed computing process. Mainstream status in the field of distributed computing.

1.3 Selection based on different stages of the open source software adoption cycle

As a product of intellectual activities, software has its life cycle, which is generally represented by the technology adoption curve of software.

Open source software is also a kind of software, and it also follows the rules of software technology adoption. As shown below:

An open source software generally goes through five stages from its creation to its demise. From the innovation period (Innovators, accounting for 2.5%), to the early adoption period (Early Adopters, accounting for 13.5%), then cross the chasm (chasm), enter the early public period (Early Majority, accounting for 34%), and then enter The late public period (Late Majority, accounting for 34%), and finally entering the recession period (Laggards, accounting for 16%). Most open source innovation projects die without a successful cross-domain gap, that is, from the early adoption stage to the early public stage. Therefore, if you choose an open source project that needs to be used and maintained for a long time, it is more rational and scientific to choose a project in the early public or late public state.

Of course, if it's just an individual who wants to learn something new, you can look at open source projects in innovator status, or look at projects in "early adopter" status.

Note that neither from a long-term R&D system perspective nor from a personal learning perspective, stop looking at projects that are in decline (Laggards). For example, at this stage, which is 2022, there is no need to choose projects such as Mesos and Docker Swarm. Both projects have been in decline since Kubernetes became the default standard for classifying container scheduling technologies, and both have been abandoned by their parent companies. If you still invest more energy in development and maintenance at this stage, unless it is really a very strong request from Party A, you will only choose if you have to spend money in front of engineers.

Students may ask, where can we see these technology adoption curves?

InfoQ, gartner, and thoughtworks update their respective technology adoption curves every year and publish them. You can search the Internet to see what their respective technology adoption charts are, and then combine some industry experience to draw your own judgment.

For example  https://con.infoq.cn/conference/technology-selection?tab=bigdata

From here, you can see InfoQ's judgment on various popular technologies in the BigData field in 2022. 

As can be seen from the above figure, open source software such as Hudi, Clickhouse, and Delta Lake are still in the stage of innovators, that is, they are still less adopted in the industry. Students who want to learn new projects can focus on them. However, these open source software are not suitable for mature application scenarios that require long-term maintenance.

Note that the technology adoption curves of these well-known technology media are updated every year, and don't forget to pay attention to the time of publication when making reference.

1.4 Select open source software according to the maturity of open source software

Another point is to choose open source according to the maturity of the open source software itself. That is, whether the open source software is regularly released, whether it is in a state of multi-party maintenance (even if a company's strategy has changed and no longer maintained, and other companies are supporting it for a long time), whether the documents are relatively complete and other dimensions. Carry out a maturity assessment.

For the maturity model of open source software, the open source community has many maturity models for measuring open source projects, among which the Apache Open Source Software Foundation's project maturity model is relatively well-known.

You can refer to here:  https://community.apache.org/apache-way/apache-project-maturity-model.html

According to the open source project maturity model developed by the Apache Open Source Software Foundation, he divides the evaluation dimension of an open source project into seven dimensions:

  • Code
  • License and Copyright
  • Release
  • Quality
  • Community
  • Consensus Building
  • Independence

There are several inspection items for each latitude. For example, for Independence, there are two more inspection items. One is to see whether the project is independent of the influence of any company or organization. Representatives of the organization are present and active in the community.

Apache Foundation Top Level projects are top-level projects, and comprehensive judgments will be made from these dimensions during the graduation stage. Only projects that meet the standards in all aspects will be allowed to graduate from the Apache Foundation's incubation status and become Top Level projects. This is also the reason why individuals prefer Apache top-level projects.

In addition, the criticality score of the OpenSSF project (see  https://github.com/ossf/criticality_score ) is also a good reference indicator, which measures the number of community contributors, submission frequency, release frequency, and dependencies of a project. To judge the importance of an open source software in the open source ecosystem. I will not go into details here. Interested students can refer to its information. I personally think it is a direction worthy of reference, but this score is still in the early stage and is still far from the ideal state.

1.5 Selection based on project quality indicators

Obviously, the code quality of some open source software is better than the quality of other open source software. Sometimes open source software needs to be selected based on the quality of the project.

At this time, we need to look at some indicators that have been widely proven effective in the industry.

Among them, MTTU is an indicator recommended by SonaType, a well-known open source supply chain software supplier. It mentions MTTU in its famous annual supply chain report. See  https://www.sonatype.com/resources/state-of-the-software-supply-chain-2021

MTTU (Mean Time to Update): The average time it takes for an open source software to update the version of the library it depends on. For example, an open source software A depends on open source library B, suppose the current version of A is 1.0, and the version that depends on B is 1.1. One day, the version of open source library B was upgraded from 1.1 to 1.2, and after a period of time, open source software A also released a new version 1.1, which upgraded the version of dependency on B from 1.1 to 1.2. This time interval, that is, the time between the upgrade from open source version B to version 1.2 and the release time of new version 1.1 of open source software A, is called Time to Update, which reflects that the R&D team of open source software A, according to the dependency library The update cycle, the ability to synchronously update the version of its dependencies. Mean Time to Update refers to the average update time of this software. The lower the value is, the better the quality is, which means that the person in charge of the software is quickly upgrading the versions of various dependent libraries and fixing security vulnerabilities caused by various dependent libraries in a timely manner.

According to SonaType statistics, the update and upgrade time MTTU of open source software in the industry is getting shorter and shorter. According to its statistics, the Java-like open source software on the Maven central repository has an average MTTU of 371 days in 2011, an average MTTU of 302 days in 2014, an average MTTU of 158 days in 2018, and an average MTTU time of 2021. is 28 days. It can be seen that with the acceleration of the update frequency of open source software libraries, the software that uses them has also accelerated the update version speed. Compared with 10 years ago, the time of MTTU has been shortened to less than 10/1 of the original.

Of course MTTU is only an indirect dimension of project quality. Whether important and high-risk security vulnerabilities have been discovered in history, whether the repair response is fast and timely, etc. are also important dimensions of the quality evaluation of open source projects.

The security departments of some major companies will continuously evaluate the security situation of open source software, and set some open source software that frequently occurs high-risk security loopholes but are not repaired in time as unsafe software, and include them in the internal open source software blacklist. Internal publicity, and requires each business R&D team to no longer use these software. In fact, it is necessary to migrate these old services to a relatively closed network environment because of R&D and manpower problems. It is also necessary to migrate these old services to a relatively closed network environment to reduce the possibility of risk. losses caused. At this time, it is obviously necessary to abide by the company's security regulations and no longer use open source software on the blacklist.

1.6 Consider from the perspective of the open source community governance model to which open source software belongs.

There is another dimension, that is, considering the community governance model of this open source project, it is suitable for projects that require long-term development and maintenance.

The Governance Model mainly refers to how the project or community makes decisions and who makes the decisions. The specific performance is: Can everyone contribute or a few? Are decisions made by voting, or by authority? Are plans and discussions visible?

There are three common governance models for open source communities and open source projects:

  1. Single-company dominance: The characteristic is that the design, development, and release of software are controlled by a single company, and no external contributions are accepted. The development plan and version plan are not disclosed to the public, and the relevant discussions are not open to the public. The source code is only open to the public when the version is released. For example, Google's Android system.
  2. Dictatorship (there is a proper noun "Benevolent Dictatorship", which translates as "benevolent dictatorship"): characterized by a person who controls the development of a project, who has strong influence and leadership, generally the project founder of . For example, the Linux Kernel was led by Linus Torvalds, and Python was previously led by Guido Van Rossum.
  3. Board-led: It is characterized by a group of people who constitute the project's board of directors to decide major issues of the project. For example, the project of the Apache Software Foundation is decided by the PMC of the project, and the decision of the CNCF Foundation is the responsibility of the CNCF Board of Directors (many technical decisions are delegated to the Technical Oversight Committee under the CNCF Board of Directors).

Personal opinions and experience, according to the governance method of the open source community behind the open source software, the priority of selection is as follows:

  1. Apache graduation projects are preferred (because the intellectual property rights of these projects are clear and maintained by at least three parties for a long time)
  2. The second best choice is the Linux Foundation and other key projects of open source foundations (because the Linux Foundation has strong operational capabilities, and each key project is often supported by one or more large companies)
  3. Carefully choose a company-led open source project (because the company's open source strategy may be adjusted at any time, it is very likely that it will no longer support the project, for example, Facebook is a company that abandons a lot of pits)
  4. Try not to choose personal open source projects (personal open source is more casual, and the risk is particularly high, but some projects that are already well-known and run out of long-term maintenance mode are not excluded, such as the well-known open source author Evan You is responsible for Vue.js open source software).

This is the order of priority recommended by individuals for selecting similar open source software projects, which only represent personal opinions, and discussions are welcome.

2. How to customize and maintain

After an open source software is introduced into the enterprise and used for development and long-term maintenance, the problem of how to customize and maintain it arises. First of all, it should be clear that after open source software is introduced into the enterprise, it needs to be customized. For the following reasons:

  1. Open source software is often suitable for general scenarios, considering many situations, and needs to support a variety of usage scenarios. However, after being introduced into the enterprise, it is often only necessary for the specific scenarios of the enterprise. Therefore, optimizing for these specific scenarios, such as tailoring all functions, removing features unrelated to this scenario, and performing performance tuning and parameter optimization for specific scenarios, can often achieve better performance, such as resisting more traffic. , the effect of saving machine cost is amazing. This is also a common custom method.
  2. Open source software needs to be developed and operated for a long time to enter the enterprise, and it needs to meet the various internal service operation and maintenance specifications of the enterprise. For example, when a business goes online, complete logs and monitoring are required. For example, a service health check interface needs to be provided, and fault-tolerant processing such as traffic scheduling is also required. These all require custom modifications.
  3. Open source software also needs to be connected to the upstream and downstream systems within the enterprise. For example, if the correct operation of the software needs to rely on the underlying distributed storage and distributed computing systems to complete basic functions, it needs to be connected to the existing storage systems or computing systems within the enterprise. ; The underlying virtual machine system or container scheduling system within the enterprise often has some modifications and optimizations, and it needs to be modified when it is connected; therefore, it needs to be customized and modified at this time.
  4. Demand customization in special scenarios, the use of this open source software in enterprise application scenarios often encounters specific problems and may encounter bugs, all of which require Bugfix and new features to support.

2.1 How to customize and modify open source software?

In this regard, the author suggests several basic principles: Do not change the core code of the open source software, try to use the existing plug-in mechanism of the open source software; or change it in the periphery; regularly upgrade to the stable version of the open source community.

At the beginning of the design of many open source software, a lot of extension mechanisms are left to facilitate subsequent developers to expand functions and add features. For example, several of the most famous open source software Visual Studio Code and Firefox Browser provide an Extension mechanism. Many developers develop corresponding plug-ins according to their own needs, and submit the plug-ins to the officially supported plug-in market. After installing the main program, ordinary users can also browse the plug-in market to find and select the plug-ins they need to install. In addition, like Kubernetes, it also provides extension mechanisms in many places. For example, the core scheduler provides customized schedulers for developing personalized scheduling strategies; the underlying storage and network provide many plug-in mechanisms; the most worthy It is commendable that it provides a CRD (Custom Resource Definition) mechanism, allowing developers to define new resource types and reuse Kubernetes' mature declarative API and scheduling mechanism for convenient operation and maintenance. Therefore, try to use the existing plug-ins or extension mechanisms of the open source project to add features.

The modification and customization of some open source software is not suitable for using its extension mechanism, or it does not provide an available extension mechanism. For the modification at this time, try to modify it in the periphery of the source code core instead of touching its core code. Because open source software is constantly iterated with the development of the open source community, the development of the open source community will continue to bring more and better features. If the core code is modified, it will be very painful when you need to upgrade to a newer open source version. Because there are a large number of internal patches that need to be merged and various tests are required, the upgrade cost will be too high and it will not be able to synchronize with the main version of the community. Finally, due to the resignation or transfer of some core engineers, no one can continue the modification of that part. If the maintenance continues, the entire system cannot be maintained and upgraded, and finally the entire system will be abandoned or rebuilt, which will lead to a lot of labor costs. The author has worked in many big Internet companies for many years, and I have seen too many such projects, too many modifications originally aimed at open source projects, which are very necessary, but because the core code has been changed, I want to upgrade to the open source community. The cost of the new version is too high, and in the end, no one can maintain the system, so the example has to be overturned.

For example, I saw in a large factory that there were two technical teams maintaining the Redis cluster, and the versions used at that time were the Redis 2.x version. Because there are not many cluster functions and it is not good for large-scale business support, both teams have modified the 2.X version of Redis. Among them, Team A's reform is to change at the periphery, that is, a layer is encapsulated on top of Redis, which is used for traffic scheduling, Failover processing and other functions; Team B is more ruthless, directly changing the core code of Redis, and changing the cluster function. The relevant code is added directly, and even in some local test scenarios, the performance is better. In a short period of time, both teams were able to meet the needs of the line of business. However, the Redis open source community is constantly iterating and adding more and better requirements. When Redis is released to 3.x, both teams want to upgrade to a newer version, because business parties using Redis also want to use 3.x version. However, the upgrade cost is obviously different. Team A quickly migrated the relevant functions to 3.x, and quickly upgraded the Redis version; as for Team B, because the changes to the core are too large, the cost of porting and testing are too high , so the 2.x version of the service cannot be upgraded. After the 4.x version of the community came out and the core engineer of Team B resigned, no one in the Redis cluster could continue to maintain and meet the new version needs of customers, so they had to reinvent the wheel and build the cluster directly from the 4.X version of the community. It took a long time for the system to migrate, and it also brought a lot of costs to customers.

Therefore, it is recommended to modify the source code of open source software in the form of Local Patch, which is convenient for maintenance and upgrade, and also convenient for management and statistics. In this mode, the compilation script of the internal project generally unpacks a source code package of the open source software, and then uses the patch command to enter these Local Patches one by one, and then compile and test together. Instead of putting the Patch directly into the business source code, although it saves a few minutes in the CI stage, the subsequent maintenance, upgrade, and management have added considerable trouble.

2.2 Feedback to the community, Upstream (feedback) to the upstream open source community to reduce maintenance costs

After an engineer adds features or bugfixes to a certain version of an open source software within the enterprise, it will generally exist in the code base in the form of a Local Patch. The author recommends that engineers try to submit these Local Patches to the upstream open source community to which the open source software belongs to complete the Upstream process after solving the business problems.

Upstream has the following benefits:

  • get better code

Adding features to an open source software within an enterprise, especially a patch of Bugfix, often because of the urgency of time, the “Hack” method is more often used, that is, in order to solve the problem quickly, the place where the patch is fixed is not necessarily very reasonable, and the code is patched. There may be loopholes in the logic, code patches may be less than perfect for handling more exception conditions, etc. At this time, if the Local Patch is brought back to the open source community to which the open source project belongs, and after in-depth communication with the senior engineers (Module Reviewer/module leader) of the open source community, they will often based on their feedback, Better refinement of code patches, resulting in better code.

  • Reduce maintenance costs

Internally reserved Local Patches, each time you upgrade to a newer version of open source software, these patches need to be evaluated, and some of them need to be integrated and tested. Of course, it is hoped that the number of these Local Patches will be less and less. The best way is to include these patches when the open source community releases a new version. The higher the number of inclusions, the lower the number of local patches that need to be evaluated, merged and tested within the enterprise, and the lower the cost of upgrading. Remember that in Fedora's release version, each version retains a lot of Local Patches for the kernel and other components, and Red Hat engineers are constantly contributing and integrating these Local Patches into the upstream open source project community, so as to maintain The number of local patches inside Fedora is at a relatively low level, which also ensures that the cost of upgrading versions is relatively controllable.

  • Build a team's technology brand and employer brand, facilitate recruiting, and increase engineer pride,

Contributing code to the upstream open source technology community, such as Upstream Local patches, can get better community reputation. Show these technical communities that the company is not only a consumer of open source software, but a contributor as well.

At the same time, a strong team technology brand can be established, indicating that the company not only has a relatively good business, but also has a strong technical team, which is convenient for external recruitment.

Upstreaming to the upstream open source community also helps to improve the team's engineer pride and satisfaction.

For example, when Xiaomi uses the Apache HBase project a lot, the responsible R&D engineer resolutely implements the Upstream strategy, and constantly contributes the patches verified by Xiaomi back to the HBase community, and conducts certain activities with the students in the HBase community. Discussion and development of these features. The influence of Xiaomi classmates in the HBase community is growing, and Committers and PMCs have been continuously generated. Finally, Xiaomi engineer Zhang Duo became the PMC leader of the project, that is, the PMC Chair of the project. Xiaomi's technology brand in the fields of big data and cloud computing is largely derived from the R&D team related to this project.

3. How to use open source for personal growth

The growth of an engineer is closely related to his daily work and his daily learning. In this process, how to use open source software to better help engineers grow and help engineers realize their professional or technical ideals, here are some suggestions.

3.1 Openness and sharing, vision and mentality

Only by standing on the shoulders of giants can you stand taller. There are all kinds of software in the open source world, facing various scenarios and solving various problems. So be sure to keep an open mind, that is, before doing anything technically related, look at how others are doing it. You must know that the world is so big. More than 99.99% of the problems encountered by engineers are problems that others have encountered, how others have solved them, and what experience can they learn from them, especially if you can look at other people's open source projects and see their Design documentation to see how they think; look at their source code to see how they are implemented. If you are interested, you can further communicate with them directly. As a result, you can avoid a lot of detours, avoid a lot of unnecessary duplication of work, and avoid repeated pitfalls. Second, there is no need to reinvent the wheel, and the limited time can be devoted to more valuable work. Don't sit back and watch the sky, I'm the best in the world, look at the industry and open source world, and students who have just graduated and go to big factories need to pay special attention.

In addition, a shared mentality is also needed. It is best to share what you have learned, so that others can refer to it and learn from experience and lessons, so as to achieve the purpose of common improvement.

3.2 Recommended steps and methods for learning open source software - Feynman learning method

There are various ways to learn open source software. For different learning purposes, you also need to adopt different learning methods that are more suitable for you according to your own situation (that is, your familiarity with the field and your understanding of related open source projects).

The author here recommends a method suitable for engineers to learn a new open source technology project:

  1. Get started as soon as possible, run the Quick Start (Quick Start) and Tutorial (Introduction Tutorial) of this open source software, and first understand its main scenarios and key features.
  2. Then look at the document, pay attention to the main architecture diagram of the system, understand the general architecture of the entire system, and establish a relatively large overall framework diagram.
  3. Finally, look at the relevant details in combination with your own actual application scenarios, including the documents and codes of a certain detail.

For example, if you want to learn Kubernetes, first go to its official website and quickly run the tutorial provided on its official website (https://kubernetes.io/docs/tutorials/kubernetes-basics/create-cluster/cluster-interactive/), Learn how to create pods, how to access, how to update, how to schedule traffic, and more. Then look at its architecture diagram to understand its design principle, which is declarative programming, including the functions of several core components Kube-apiserver, kube-scheduler, kube-controller-manager, kubelet, etc. and how these components interact; finally Then, according to the needs of your own business scenario, see which part needs a more in-depth understanding. For example, if you need to add your own storage method, then look at the relevant code and refer to the implementation of the storage method of other friends.

It is not recommended to look at the source code first, as it is confusing and inefficient. Moreover, many open source projects are too big now, and the iteration speed is very fast. It is difficult for anyone to understand all the codes, and it is impossible to do it from the perspective of personal energy, let alone unnecessary.

Note that learning must be combined with application, that is, hands-on. "On paper, it's superficial, and you never know what to do ." The ancients said, honestly don't lie to me, especially for engineers. If you want to have a deeper understanding of a new technology, or even plan to switch technical routes and career tracks, then you must do more hands-on, use this open source software, or write a little program to run a Demo and run it on In the experimental environment, it is best to solve some practical problems around you. Don't look down on the masters and think that everything is very simple, but if you really want to run, it is extremely difficult to use. You can try to participate in some innovative activities in technology companies, such as hackthlon (Hackathon) activities, to use the newly learned technology; or write a little gadget, let him run, solve a little practical problem. For example, if you want to practice Python, write a crawler, crawl the data on the weather forecast website every day, and then do a simple query to get the current weather forecast. In the middle school, apply what you have learned.

Another very useful learning method is Feynman learning method. Feynman learning method is considered to be one of the most effective and powerful learning methods, and it works. The steps are also very simple, I simplified it into the following three steps.

  1. Learn a technique first
  2. Tell it to the common man and let him understand
  3. If the audience doesn't understand, go back to step one

Through this method, only by explaining the usage and structure of the technology by yourself, and allowing ordinary engineers to understand the technology, can you really master it.

The Feynman method of learning originated from the Nobel Prize winner in physics, Richard Feynman. He is a well-known theoretical physicist, one of the founders of quantum electrodynamics, the father of nanotechnology, and was awarded the Nobel Prize in Physics in 1965 for his contributions to quantum electrophysics. The learning method he advocated is called "Feynman learning method". Although the steps are very simple, it can simplify complex technologies and explain them in a way that ordinary engineers can understand. This requires a deep understanding and mastery of this technology, as well as analogies to some proper nouns and concepts. , Lenovo to simplify. Generally, this can be done, which means that the technology has reached the level of entry, and you can continue to learn more deeply.

In addition, it is a good way to take exams or certifications of some well-known courses in the industry. For example, an engineer who is unfamiliar with cloud native, when he passes the CKA (Certificated Kubernetes Administrator) certification Kubernetes administrator exam, this certification can verify that he has a certain level and has established a comprehensive understanding of the common operations and system architecture of Kubernetes .

3.3 Integrate into the open source community and gain a lifetime of personal reputation

Finally, for engineers, participating in and actively contributing to the open source community will gain lifelong reputation and make lifelong friends, which is very beneficial to the long-term development of engineers. Here, engineers are encouraged to choose open source projects and communities they are interested in, and continue to grow in the community through exchanges and contributions. Even if he does not continue to be active in this open source project and community due to work relationships or other reasons, his contributions will always be recognized. The Apache Open Source Software Foundation has a well-known motto: "Merit never expires" (see  http://theapacheway.com/merit-never-expires/ ), which means that engineers who contribute to the Apache Open Source Software Foundation projects and communities get The recognition is never out of date. Once a committer, always a committer.

Collaborating in open source communities is also a way for engineers to socialize. Here, being able to meet lifelong friends, work and communicate with them is also very effective for the growth of engineers. Many big bulls in the open source community are also very friendly in the community, especially for newcomers, and for engineers who are relatively junior but have a strong desire to contribute, and are more willing to teach them hands-on. With the help and guidance of these great engineers, the growth of newcomers is very fast, and there is no ceiling brought by enterprises/departments/work projects. That is to say, newcomers can continue to communicate and learn from senior engineers in the community with an open-minded attitude and desire to contribute in the open source projects and communities they are interested in, which can bring about the rapid development of technical capabilities.

In addition, for today's engineers, it is difficult to have a company with lifetime employment. An engineer works in the company for a period of time, and then with various active or passive changes, the position or employment company will also change. However, the recognition of contributions in the open source community, as well as the established personal brand and technical reputation, will always follow the individual and will not change due to the situation of the company or enterprise. I can see many people who have been active in the open source community. Although many career changes have taken place, their recognition and brand in the open source community have always existed. This is also a good way for many engineers to break through professional involution and break through platform restrictions.

Contributing to the open source community for a long time is a good thing that benefits others and oneself. Every engineer with ideas and actions is encouraged to find open source projects and communities they like and invest in, and integrate into them.

3.4 How to contribute to the open source community

In open source communities, especially those that respect elite governance (such as Apache Foundation projects), the more contributions you make, the more recognition you get. But many times, as a newcomer, to contribute to the open source community, it is not something that can be done by raising your hand, but you need to understand some community rules first, and then follow the rules before you can slowly integrate.

1. What do you contribute?

Before making contributions, we need to understand that contributions to the open source community are not limited to code contributions. Writing code to add functions or bugfixes is a contribution, improving documentation and test cases is a contribution, reporting usage problems is a contribution, and writing a blog to introduce the project and Recommended projects are also contributions, which are widely recognized in the open source community.

For many technical experts in the community, entering the open source community to make contributions starts with submitting a test report. For example, Blake Ross, the youngest architect in the Mozilla community at the time (became one of the highest technical decision-makers in the Mozilla community at the age of 17, and founded the Firefox project with another architect), he first entered the Mozilla community as an intern. The test begins.

"Scratch your own itch!" This is a very popular phrase in the open source community, which means that to contribute to the open source community, you need to solve your own problems. That is, if you encounter a problem in actual work, try to solve it, and finally contribute the solution to the community in a way that the community accepts. The general situation is that there is a bug or problem that affects the user's actual application, or you want to add a new function to meet the company's own scenarios, or you just want to learn some new technologies. This kind of contribution to solving their own needs is relatively long-term. For some petty profits, participating in some activities of the community to get rewards is just for fun for engineers, and this kind of contribution is not long-term.

Therefore, for a newcomer, entering the open source community, contributions can start from some simple problems, starting from solving their own needs. For the simplest example, first read the novice introductory document, follow the steps described in the document step by step to see if you can get through; if you can't get through, you can report a bug; or you need to add some extra steps to experience it yourself. If you get through, you can provide a Patch for this novice introductory document and describe these supplementary steps. This is also a very welcome contribution from the community.

Some communities set some simple bugs as "Good First Issues". Contributors can choose these issues to contribute, to familiarize themselves with the contribution process, and to integrate into the community.

2. Understand existing community conditions and respect community practices and habits

The first step in contributing to an open source community is understanding the community.

You can learn some basic information about the open source community through the community's website, mailing list, Wiki, and documents in the github code repository.

Learn about the contribution process and recommended approach for these projects by reviewing the key documentation (Contributing.md).

Note that each open source community has its own conventions. For example, they have their own issue management system (some may use github issues, some use Bugzilla, and some use Jira), and the process and requirements for submitting Patches are also different.

For example, the very long-standing Apache HTTP Server project has the following requirements for contributors:

  1. Patches need to match their Code Style
  2. There are also some requirements for code quality such as thread safety
  3. Patch needs to be compared against the current development version – 2.5.X
  4. The format of Patch is generated using diff -u file-old.c file.c
  5. The entry for submitting Patch is at bz.apache.org/bugzilla, it is recommended to add the keyword "PatchAvailable"
  6. You can send emails to discuss in the mail list, the title of the email needs to be [PATCH]

Note that the method they use is not the popular Fork/Pull Request mode on github, but the older Bugzilla+Diff Patch mode. Please respect their work habits and use the mode they require. (To be honest, when I contributed to the Mozilla community 20 years ago, I also used the Bugzilla + Diff Patch method. More than 20 years later, the working mode of Apache's HTTP Server project has not changed much. .However, the way of working does not affect the contribution, just be familiar with it and get used to it.)

Some open source communities provide a gamified contribution process, which allows developers to familiarize themselves with the project and contribution process through a series of simple novice tasks. This method is more friendly to newcomers and has been carefully designed by the community manager of the community. So for contributors, don't let their hard work betray their good intentions, go through the tasks you feel are necessary, and be familiar with the tasks and processes you want to be familiar with.

3. Attitudes need to be “Be Polite and Respectful” and respect the diversity of the community

The open source community is full of diversity.

Most of the senior engineers in the open source community are very friendly to newcomers. They will teach newcomers patiently, familiarize themselves with documentation, and are familiar with the contribution process, etc. In daily communication, including on mailing lists, in IRC or Slack channels, in Issue comments, it's nice. It is easier to communicate and collaborate with them.

But note that there are some people who are not particularly good in their relative attitudes. If you encounter them, be careful not to have a head-on conflict. It is suggested that you can ask some more senior engineers in the community for help, instead of being rigid. It's impossible to change anyone, and it's impossible to please everyone, just do the necessary work.

4. How to quickly find the Module Owner responsible for code review and complete the contribution

Sometimes, when you follow the documentation of the community contribution process, whether it is to raise an issue or report a bug, and find that the module leader's feedback is very slow, there are some tricks at this time.

You can join their IRC or Slack channel, find the corresponding module owner, and have a polite and constructive conversation with the module owner.

Build good relationships with them and gradually build their trust through practical contributions.

Note that open source communities operate on trust. Gaining the trust of the person in charge of the module is very beneficial for subsequent work.

5. Submitting a large patch requires attention to the steps

Some engineers may have reported that I submitted a very good feature to a certain open source community, and tested and verified it in my company's internal working environment. The effect is very good, and the performance is very good. But when I submitted the code to the upstream open source community, I found that the community did not value this feature, but instead pointed at my Patch and picked out various faults. It's too much trouble, too tired to contribute.

It is necessary to imagine heart-to-heart. If a stranger submits a large patch to your project, it will be very difficult to implement code review because the patch is relatively large. Although the contributor said that this patch is very useful, it has implemented a very powerful function, and it has been verified by him, but whether it is reliable, whether he can exist in the community for a long time, and whether he can repair the code generated by the code he submitted in time. Questions, these are question marks. So submitting large patches can be laborious until basic trust has been established (ie, a few small patches have been submitted and credited).

In addition, the engineers who submitted this patch often do not know the history of this open source community. Maybe this feature has been discussed in the community for a long time, and maybe the conclusion of the discussion is that it does not need to be done or is done elsewhere. Therefore, instead of blindly confident in your own Patches, you should first communicate this scenario and problem with the engineers in the community.

The author recommends the following steps to contribute:

  1. If it is judged that this patch is relatively large, then discuss the problem in the community first, let the community recognize this problem, and also get some historical information (if any) of the community on this problem.
  2. If the community recognizes the problem and thinks it should be fixed now, continue to discuss solutions
  3. After the problems and ideas have been recognized, and after a little design is completed, discuss the specific code Patch
  4. Patch needs to comply with the community's specifications (CodeStyle, component call specifications, test specifications, documentation specifications, etc.)
  5. Be mentally prepared, the Patch may need to be modified several times before it can be finally merged. It may be necessary to split a large Patch into several small Patches, and submit and enter them in batches. Certain compromises are required when necessary.

Contributing a large patch to realize an important function requires many steps and a long time period, but after completion, it can be highly recognized by the community, which is often the basis for becoming a higher-level contributor. And for the individual contributors, the inner sense of satisfaction and achievement is also very sufficient.

6. Be careful not to do the following

  1. Come up with an Idea and hope someone else will do it.

Especially when you just joined a community, you suggest that the community needs to do certain things, but you don’t do it yourself, and hope that others in the community will do it. These opinions are often ignored. "There are many people, 'at the beginning of getting out of the car', who are talking and making comments, criticizing this, and accusing that. In fact, ten out of ten such people will fail." popular with the community.

Ask a question while offering a constructive solution, and if you want to participate, you can invite the rest of the community to come along. This is the recommended practice.

2. Being overly eager, impatient, and ignoring community conventions.

It's better to go slow, and be patient, especially until the community's trust in the newcomer is built. The author once met an engineer who had just entered the open source community. He had strong technical skills, but he just wanted to hurry up and get his patch into it. When communicating with the person in charge of the module, although the attitude is polite, the response to the improvement suggestions given by the person in charge is very perfunctory. After tossing for a few times, the contributor's reputation in the community has been completely lost, and his related Bugfix and new function development progress is very slow, and he later left the project sadly.

3. Don't touch the red line (that is, some bad behavior prohibited by the community's code of conduct)

Basically every mature open source community has its own code of conduct (Code of Conduct), which is usually displayed in a prominent position on the community website or code repository.

The norm lists a number of actions that are unwelcome in the community, including discrimination and offense based on gender, race, religion, and more.

Be careful not to have these behaviors. There may be behaviors that are not considered a big problem in the Chinese open source community, but not necessarily a trivial matter in the international community.

7. Pay attention to compliance issues when making contributions to the upstream community within the enterprise

To contribute to the upstream community within the enterprise, because the results of the company's internal research and development are made public, it is necessary to meet the company's internal open source contribution management methods.

Every company has different rules for this. For example, Google encourages engineers to contribute to the open source community, but requires that engineers should use the email address of google.com to make contributions. Contributions under 100 lines do not need to go through internal process approval, but only if the project does not use a license prohibited by Google (such as AGPL) , Public Domain, CC-BY-NC-*), in addition to some hard conditions, see the official website link of Google OSPO  https://opensource.google/documentation/reference/patching . Domestic Baidu companies also encourage engineers to contribute to the open source community. No matter the size of the patch, it needs to go through an internal electronic process, which is approved by the technical director of the department, and submitted to Baidu's Open Source Management Office (OSPO) for the record, so as to facilitate the follow-up of the open source office. Data statistics and contribution incentives to engineers provide data support, etc.

When making contributions to the upstream community within the enterprise, it is often encountered that the community requires the engineer to sign the CLA (Contribution License Agreement, that is, the Contribution License Agreement) or the DCO (Developer Certificate of Origin, the developer's original statement). Among them, CLA is further divided into ICLA (Individual Contributor License Agreement) and CCLA (Coperation Contribution License Agreement, that is, enterprise-level contribution license agreement). ICLA is for individuals, and CCLA is for the entire enterprise. , the engineers within the enterprise do not need to sign ICLA separately if they make contributions. Patches cannot be submitted without signing a CLA. The terms of the CLA are that contributors license their contributions to the community for use. In this case, please abide by the company's internal regulations, and the relevant CLA terms may need to be reviewed by the company's internal legal affairs. Fortunately, the CLA terms of some well-known projects, such as the projects of the Apache Open Source Software Foundation, use a unified CLA file, and the projects of the CNCF Foundation are similar. The CLA terms of these famous projects will be no problem after legal confirmation. If it is not a CLA that has been confirmed by the legal affairs department, you need to consult with the legal affairs department in charge of the company to avoid encountering some CLAs that are unfavorable to the company.

Summarize

This article is relatively long, and condenses many of my experiences and experiences.

I always think that engineers are a very pragmatic and hard-working group of people, a group of people who deeply believe that "we can use code to change the world", a group of people who think "Talk is cheap, Show me the code", "Sun Gong Yi Shi" , the merit is not donated". I've always believed that "open, collaborative, pragmatic" is one of the best characteristics of contemporary engineers.

Learning, working, and sharing in an open source world is one of the best ways for engineers to change the world.

おすすめ

転載: blog.csdn.net/m0_66404702/article/details/127107415