Nine recommended git / github project data analysis tools

Nine recommended git / github project data analysis tools

Any important decision should be based on data, information for projects and software development as well. Today, Wang Jiong grass give you some Git / GitHub open source analysis tool for them to learn.

1、GitHub API

First thing to say is that Github official API, this is the best way to detailed warehouse GitHub get. API is very easy to use, you can use the curl library or any other language pack, get all the information repository (Git other public online hosting platform or self-built Gitlab have similar API). But is Github API calls made to limit the number of requests per hour is limited, if you want to analyze large-scale projects, use the API.

 

By Github API, you basically get all the information you see when you visit Github repository browser project, but Git repository information inside information is limited, you need the clone warehouse to get complete information by git command.

2、GHCrawler

GHCrawler is developed by Microsoft a robust GitHub API reptiles, and messages can be traversing GitHub entity, its search and tracking. If you want to analyze the activities of an organization or project, the GHCrawler particularly useful. GHCrawler Github API also by limiting the number of requests, but to optimize the use GHCrawler API token by token pool and rotation. GHCrawler supports command line call, which also supports a Web interface operation ends

3、GH Archive

GH Archive is a project open source, public GitHub timeline for recording, archiving it, and make it easily accessible for further analysis. GitHub Archive GitHub events acquired all the information is stored in a set of JSON file, and downloaded as needed to off-line processing.

In addition, GitHub Archive can be used as a common data set used on the Google BigQuery. The data set is automatically updated every hour, SQL-like queries can be run in any of the entire data set in a few seconds.

4, GHTorren

And GH Archive Similarly, GHTorrent Github project is also used to monitor public event schedule information. For each event, it can retrieve its contents in detail and interdependencies. The resulting information is then stored into MongoDB JSON database, while also extracted into the MySQL database structure.

GHTorrent and GH Archive somewhat similar, except that both GH Archive is designed to provide a more detailed set of events, get information on an hourly frequency. The GH Torrent places provide a more structured fashion event data to make it easier to obtain information about all events related to the event, data acquisition frequency is monthly.

5、Kibble

Apache Kibble is a set of tools to collect, aggregate and visualization software for the project activities. Kibble Kibble architecture consists of a central server and a plurality of scanning applications designed for use with a particular type of resource objects and data compiled Kibble push server.

Based on these data, it is possible to customize a dashboard, wherein the display member comprises many small data items. In this sense, Kibble is more of a tool to help create Web project data show ends.

6、CHAOSS

CHAOSS is the Linux Foundation project, to create a data analysis and indicators to help define a healthy open source community. CHAOSS projects there are many tools that can be tapped indicators and data needed to calculate the project:

Augur is a Python library, Flask Web applications and REST server for providing health and sustainability indicators related to open source software development projects.

Cregit are focused on the source code to generate the view to visualize changes

** GrimoireLab ** Bitergia's by far the most mature and ambitious tool. GrimoireLab aims to provide an open source platform: almost can be collected from any of the tools associated with the open source development of automatic and incremental data, automatic data-rich to clean up and expand data collection above, data visualization, according to the time frame, project, store library, contributors and other search filter.

7、Sourced

Sourced claiming for the development of life-cycle data platform. Compared with the previous tool, it is more concerned about the code of the project, rather than collaborative community. Sourced project using a common AST, can achieve nothing to do with the way the language of the code base query details.

In Sourced project organization, you can find some interesting data analysis tools. include:

git-Go : pure Golang language to achieve highly scalable git repository.

Hercule: golang achieve submission history analysis tools for the entire repository.

gitbase: Git repository Golang SQL database interface implementation.

8、Hubble

Hubble for visualization GitHub Enterprise collaboration, usage and health data. It is committed to helping large companies to understand their internal organization, and how to project contributors distribution and collaboration together.

Hubble Enterprise consists of two components. Update component is a Python script that queries a day from GitHub Enterprise device-related data, Git repository stores the result in. docs component is a Web application for data visualization collected by hosting GitHub Pages.

9、Onefetch

Finally, mention a very powerful command-line under the git project information visualization tools, support for 50 languages, mention it because it is emerging Rust language.

Guess you like

Origin www.cnblogs.com/heqingxiaohuo/p/12158363.html