Open source to bury the hatchet? This article will help you understand the security risks!

Introduction

At present, factors of instability, uncertainty and insecurity are becoming increasingly prominent in the international situation, and wars without gunpowder exist in various industries. In the field of information technology, due to the outbreak of the Russia-Ukraine conflict last year, Oracle and SAP announced the suspension of all operations in Russia, and Github is considering the possibility of restricting Russian developers' access to open source code repositories. This also makes us aware of the geopolitical nature of foreign open source projects. Therefore, issues related to information technology security controllability and open innovation have received unprecedented attention from the country.

Focusing on the field of graph databases, most domestic manufacturers adopt a method based on foreign open source code + self-research. Due to the presence of open source components in the construction process, the graph database products developed are prone to violate open source license agreements during commercial use, resulting in the risk of intellectual property infringement. At the same time, the security management company Endor Labs released a report stating that "almost all (95%) open source vulnerabilities exist in transitive or indirect dependencies." We can also understand that open source software may have bugs and is prone to security or functional vulnerabilities . , causing a large amount of sensitive information and data of the enterprise to be leaked as the code is shared. In addition, after the software is deployed, once there is a problem with the open source component, it will affect the whole system and incur high maintenance costs.

In this context, we will use the open source graph database Neo4j as an object to analyze its dependence on open source software and its possible impact.

Dependency modeling

Neo4j is a popular open source graph database product. Internationally renowned companies including Walmart, Cisco, eBay, etc. all use Neo4j to create business value. It also has a large developer community on Github. In this article, we will take the Neo4j:5.9.0 project on Github as an example to analyze its dependencies on open source projects.

First, analyzing the project, we can find that it is highly dependent on more than two hundred open source software, and these open source software also have nested dependencies. We abstract projects and open source software into points, set open source license agreements as attributes, and abstract dependencies between them into edges. The specific graph model is shown in the figure below.

graphical model
▲Graphic model
The project once relied on
▲The project once relied on

Neo4j: The open source software dependencies of the 5.9.0 project include non-Neo4j open source software protocols such as ASF (Apache Software Foundation), MPL (Mozilla Public License), EPL (Eclipse Public License), GNU (General Public License), and GPL is followed. (General Public License)-like open source agreement components may cause corporate intellectual property risks during software commercialization, such as copyright infringement risks, patent infringement risks, trade secret leakage risks, etc., resulting in legal disputes. Let's take a look at the risks that may arise from the specific dependency chains of open source software.

Dependency analysis - key nodes

In a network, the most important nodes are not necessarily the ones with the largest weight and highest priority, but those nodes that play an intermediary role between groups. For example, brokers, middlemen, etc. in our lives can control the flow of key resources or information in the network. The approximate betweenness centrality (RABrandes Betweenness Centrality) algorithm is often used to measure the importance of vertices in the network, that is, to find such key nodes. In the open source software dependency network, this algorithm can be called to find open source software with intermediary attributes in the graph, view its open source license agreement, and analyze the impact of these nodes on downstream software. The specific query is shown in the figure below.

Approximate Betweenness Centrality Algorithm
▲ Approximate betweenness centrality algorithm

Galaxybase Studio has a built-in graph algorithm engine. It calls the algorithm through the CALL statement and sorts the query results in descending order by algorithm score. After excluding the project node neo4j:5.9.0, we select the top three nodes, which are org.junit.jupiter: Three open source projects: junit-jupiter:5.9.3, com.vladsch.flexmark:flexmark-util:0.62.2 and jakarta.ws.rs:jakarta.ws.rs-api:2.1.6. The first step is to find these three items in the picture and double-click them to expand them. In the second step, find its dependency with the project neo4j:5.9.0 through the shortest path algorithm, and display one of the paths on the canvas as an example.

Query result display
▲ Display of query results

As can be seen from the canvas, the distribution density of dependent nodes is consistent with the algorithm execution results. Project neo4j:5.9.0 has dependencies on the above three projects through other open source projects. These three projects are key nodes on the graph, and there are a large number of dependent open source components downstream. We selected one of them: org.junit.jupiter:junit-jupiter:5.9.3 for analysis. The query shows that the project belongs to Junit, which is a unit testing framework in Java language and follows the EPL (Eclipse Public License 1.0) license agreement. The agreement stipulates that when mixing source code under the EPL into a project release, a private agreement statement must be made, and the declared code must continue to comply with the agreement. Once the project is disabled due to the open source license agreement or other force majeure factors, the neo4j:5.9.0 project will not be able to conduct internal unit testing, which will have a significant impact on the stable operation of the project.

Dependency Analysis - Approximate Transit Nodes

In the dependency network of open source software, there are differences in functions between different software packages, but when the functions and interfaces are compatible, some software packages can be replaced. For example, there are different software packages that provide support for log files, and substitutions can be made between them. In addition, if a software package's open source license is unavailable due to changes, the old version of the software package can still be replaced. The Closeness Centrality algorithm is an algorithm used to measure the importance of a vertex in a graph. It can reflect the effective propagation ability of a vertex. The vertex with the highest closeness centrality score has the smallest propagation cost to each vertex. In the open source software dependency network, this algorithm can be called to find open source software with transit properties in the graph , view its open source license agreement, and analyze the dependencies around the node. The specific query results are shown in the figure below.

close centrality algorithm
▲ Close centrality algorithm

Call gapl.ClosenessCentrality through the CALL statement, sort the query results in descending order by algorithm score, and eliminate the project point neo4j:5.9.0. We choose one of them: org.apache.logging.log4j:log4j-core:2.20.0 to illustrate the results.

The first step is to randomly select an open source project, such as javax.activation:activation:1.1, and find its dependency path generated through the transit node and the project neo4j:5.9.0.

The shortest path through transit nodes
▲ The shortest path through transit nodes

The second step is to remove the transit nodes and find the shortest path that creates dependencies between the open source project javax.activation:activation:1.1 and the project neo4j:5.9.0.

Shortest path after removal
▲ Shortest path after removal

It can be seen from the canvas that after removing the transit node, the dependency path that can be generated by the original three hops needs to be increased to four hops to complete. In actual open source projects, there are no real transit nodes. Each open source software that needs to be relied upon plays an important role in the project. Once software is disabled due to unreliable factors, the project often becomes inoperable. Bypassing banned protocols through the nested relationship of open source components will also increase the cost and instability of project development.

Conclusion

Through the above two graph algorithms, we can intuitively find that Neo4j's open source project neo4j:5.9.0 is highly dependent on open source software, resulting in huge security risks, intellectual property risks, and supply chain security risks in the project.

Therefore, it is crucial to choose closed source projects to ensure the security and trustworthiness of the software. Among the current domestic graph database manufacturers, Chuanglin Technology has been working hard for 7 years and has successfully developed the only mature, commercial, closed-source Galaxybase graph platform in China with fully independent intellectual property rights. It has passed the "Software Product Source Code" of the China Software Evaluation Center of the Ministry of Industry and Information Technology. Traceability" assessment, the core code is 100% self-developed, which can fully meet the demands of various industries for domestic databases to be "independently controllable" and completely avoid business interruptions caused by a certain open source software stopping maintenance or serious security vulnerabilities. Waste of R&D costs.

In the future, we also hope that more users can get rid of their dependence on open source software to the greatest extent and build independent and controllable firewalls.

Guess you like

Origin blog.csdn.net/qq_41604676/article/details/132598641
Recommended