How to evaluate an open source project (1)丨Activity

Editor's note

"How to evaluate an open source project?" has always been a controversial problem to be solved. Just relying on some simple indicators, such as the number of stars, it is difficult to accurately reflect the quality of the project. Therefore, many platforms and organizations have launched special tools, such as Gitee's " Gitee Index " function, which evaluates the health of an open source project from multiple dimensions such as code activity, community activity, team health, fashion trends, and influence. . The CHAOSS project group under the Linux Foundation is committed to providing quantitative indicators for evaluating the development of open source communities and projects...

There are also attempts to systematically collect data from the open source community, including project popularity, reliability, and activity, to judge and evaluate the quality of an open source project.

Zhao Shengyu, a core member of the X-lab Open Lab, has been researching open source theory and open source behavior data for a long time, and has conducted related experiments. The series continues to be updated, so stay tuned!

The following main text is the first in a series.

Call for Papers:

The OSCHINA community is looking forward to more different voices, and is now inviting all users to discuss and participate. Which open source project NO.1 do you have in mind? What do you think is the most important metric to evaluate an open source project?
Welcome to participate in the discussion, and the way to participate is as follows:
  • Submit your manuscript to the editorial department ([email protected]), express your opinion and present your arguments. After the review, we will recommend articles and authors, and send a small gift.
  • Leave a message in the comment area of ​​the article, as of October 5th, the comment user with the most likes will also receive a small gift.
Participate in the norm: Start a discussion on the topic itself, justify it, and elaborate the arguments as much as possible.

text


about the author

Zhao Shengyu
Doctor of Computer Science from Tongji University, core member of X-lab open laboratory, focusing on open source theory and open source behavioral data research.

How to evaluate an open source project (1)丨Activity

I wanted to write this series of blogs after I finished the experiment and got the results, but the OB thing has been a bit messy recently, and I recalled some of my own experiences, so I wrote this first, and the experimental results will be added later. .

background

I started participating in the open source community in 2015, and started working on open source operations in Alibaba in 2018. I have been studying in X-lab for two years. In fact, I have been exploring how to more accurately evaluate the health of an open source project. Therefore, I will review some of the work that has been done as a whole, and summarize the thinking of each step. I also hope that more friends who are interested in community metrics can participate in the discussion. For my contact information, see the About page .

Activity

Since Apache Roadshow in 2015, I have participated in the open source community, and more in the annual activities of the open source community. Until I joined Alibaba in 2018, I didn't really get involved in an open source technology community, nor did I really touch open source operations. My understanding of open source was just a fun gathering of great people. Until I entered Alibaba in 2018, my judgment on open source projects was still very vague. GitHub star is the most intuitive indicator. In addition, if it comes from a foundation or a big factory, it is generally not too bad. Well, naturally a good project. This may still be the basic logic of most open source operations, because from the perspective of the public, the simplest and most direct is the most eye-catching.

Later, the question Xiaolang asked me was whether it was possible to know how many projects there are in the open source world, in which fields, and how to judge the quality of these projects. At this time, it was still 2018. For most companies, open source The quantification of the community may not even be getting off the ground.

With the in-depth research on GitHub data, I found an analysis method based on historical behavior log data. In this method, it is no longer necessary to use API to obtain the information of each warehouse separately, and all GitHub warehouses and development can be counted on a global scale. This is a great benefit for the unified calculation of indicators for thousands of projects. Therefore, the earliest exploration and sharing of GitHub global logs appeared at Open Source Summit 2019. At that time, based on the paid data service in Google BigQuery, some simple analysis work was done on GitHub global logs in 2018.

It is also in this sharing that a weighted activity algorithm based on GitHub behavior data is proposed. The specific calculation method is as follows:

The specific calculation method is:

$A_d=\sum{w_i c_i}$

Among them, $A_d$ is the activity of the developer, $c_i$ is the number of occurrences of the above five behavioral events triggered by the developer, and $w_i$ is the weighted proportion of the behavioral events. According to a simple value judgment, we can set this value to 1 - 5, that is, 1 point for each issue comment, 2 points for each issue, 3 points for each PR, and code review comments on the PR. 4 points for each, and 5 points for PR combined into one. After calculating the activity of each developer, the activity of the project can be calculated by a weighted sum. The method given before is:

$A_r=\sum{\sqrt{A_d}}$

That is to say, the activity of the project is the sum of the activity of all developers. The prescription here is to reduce the impact of the excessive activity of core developers.

think

  • 1. From an open source office perspective, we must have a "North Star" metric. For some open source startups or specific open source project teams, the health of the project can be judged by monitoring or observing multiple indicators. However, for enterprises with more than 2,000 open source projects, to monitor the health of so many projects at the same time, an aggregated indicator is required, otherwise the labor cost will be too high.
  • 2. Since behaviors such as star and fork are one-way behaviors of developers, although they express a concern for the project, they do not make specific contributions to the project, so they are not included in the calculation of activity, that is, behaviors such as star brushing Invalid under liveness algorithm.
  • 3. Even if the default contributor at that time was defined as a code contributor, from a practical point of view, all developers involved in the community, including those who submitted bugs, participated in discussions, and participated in code reviews, were actually Projects contribute, so not only code contributions, but discussions, etc. are included in the calculation.
  • 4. The weights corresponding to the five events are actually quite subjective here. 1 - 5 is a very simple and rude way of assigning values, but in fact this weight set is some of the most important strategic-level open source projects in Alibaba It was generated under the consensus of the person in charge, and gave a very high 4 points for the PR review. It also encourages everyone to conduct more asynchronous reviews based on GitHub.
  • 5. From the calculation of developer activity to project activity, we prescribe the activity of each developer. The values ​​given here are: those core developers are very active, but the overall number of participants is small. A community should not be more active than a community in which core developers are less active but have more contributors. Essentially a cut to the activity of individual developers.

In general, the calculation method of this activity is a simple weighted statistical algorithm based on GitHub behavior data. The weight belongs to the assignment of expert experience and has a certain value orientation. And the calculation from developer activity to project activity also has a certain value orientation, that is, the number of contributors is indirectly introduced into the project activity as an important factor.

In fact, this activity calculation method has been a very important part of the annual GitHub Insights report until now, and it has also undergone some iterations. For example, in the 2020 report, we lowered the weight of merge PR from 5 points to 2 points. The reason is that in fact, open PR has already had a 3-point credit, and the additional weight of 5 points for merge is too high. In addition, the weight is also corrected by the number of lines of PR code, that is, a higher weight is always given to the number of lines of code for a single PR, and the PR with very little or very much modification has been reduced accordingly, which itself is considered a a value orientation. At the same time, we also introduced the number of stars and forks in the 2020 report. Even if it is only used as an indicator of attention, it is actually helpful to understand the activity of the project. The weights of 1 and 2 are given respectively. very low.

question

But there are also some obvious problems with the calculation of liveness, such as:

  • 1. It is uncertain which values ​​need to be included, such as whether star and fork should be included. Especially under the concept of activity, it is extremely delicate whether to include behaviors that do not bring actual feedback to the project.
  • 2. The weights of different behaviors are artificially assigned. Although certain expert experience is included, the magnitude of these values ​​is actually quite subjective, especially when comparing projects, the slight difference in weights will bring about Some fluctuations in overall activity.
  • 3. This activity calculation method lacks a baseline. Since the activity here is a count of the number of behaviors in a period of time, the activity is different in different time periods, and the longer the time period, the higher the activity. Therefore, an intuitive comparison cannot be made in different time periods, and this calculation method lacking a baseline is not friendly to giving a reference threshold for activity.
  • 4. This activity calculation method does not have linear additivity at the warehouse level. In order to introduce the factor of the number of contributors, the developer activity is squared when the warehouse activity is summed up. This non-linear operation causes the warehouse activity to not be linearly addable in the time interval, which causes a comparison between operations in multiple time periods. Large impact, making some intermediate results unable to be reused.
  • 5. The square root operation from personal activity to warehouse activity itself is also an artificial experience. In essence, it is to introduce a function whose first-order derivative is monotonically increasing and whose second-order derivative is monotonically decreasing for correction, simulating a diminishing marginal benefit. effect. Obviously not only the square root, the logarithmic function also satisfies this property, and the logarithmic function is very commonly used in such calculations, and I chose the square root because of the computational efficiency, because the square root operation is more efficient in many languages. Logarithmic operations are faster.
  • 6. As long as it is a simple statistical indicator for calculation, the behavior of brushing indicators cannot be avoided. In fact, when we started to implement this indicator system within Alibaba, the activity of some developers jumped. The reason is actually that the person in charge of the original project did not know how to conduct reviews on GitHub. Before seeing this indicator system, they were reviewing synchronous chats in instant messaging tools, but after our promotion and education, some of the past The code review has been intensively supplemented. This is also what we think a good indicator system can play. Even if the indicator is brushed, we believe that the indicator has played a positive guiding role. Moreover, it has not been found that other projects maliciously brushed the activity by deliberately increasing the number of replies, splitting PRs, etc. After all, the prescribing operation itself makes the impact of the brushing behavior of individual accounts on the overall warehouse limited. Therefore, the behavior of brushing points under this indicator is currently playing an expected value-oriented role.

promotion and improvement

We do not deliberately promote this metric beyond the annual Digital Insights report. But we continue to provide such services for Ali so that they can better observe the status of their projects.

For example, in Alibaba's internal open source project big screen, we separate star and fork from activity and become an independent attention indicator, that is, the behavior that contributes to the project enters the activity, while the attention to the project is not practical. The act of giving back enters the spotlight. In this way, two different growth modes are distinguished. We also began to try to use the activity distribution to judge the robustness of the community, which is somewhat similar to the bus coefficient. If the active ratio of head contributors is low, and most of the activity comes from long-tail contributors, the community is more robust. And we're also starting to provide and experiment with some insights based on collaborative networks.

In the financial open source community, many projects have begun to adopt a similar indicator system, and Shanghai Pudong Development Bank has also issued a document adding some decay indices in the time series to the above behavior data.

in conclusion

Although there are still many problems with liveness, especially in the continuous large-scale operation and scalability. However, due to its intuitive, easy-to-understand and strong interpretability, it is actually still a widely used computing method in our laboratory. And it has been implemented in many projects, but I personally hope that there can be a better indicator system and algorithm framework to make better use of the open source ecosystem and network to measure projects more effectively.

For the introduction of other indicators in the future, please continue to pay attention to this series of articles.

Original link: http://blog.frankzhao.cn/how_to_measure_open_source_1

{{o.name}}
{{m.name}}

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324135031&siteId=291194637