Ultipa | Learn a little bit about graph databases in one article

This article includes the following content points:
· Main technical classifications of databases
· What is a graph?
· Graph pattern
· Graph database VS. Relational database
· Graph database VS. Comparison of other NOSQL
· Not all graph databases are the same!

According to Gartner predictions, “By 2025, the proportion of data and analysis innovation using graph technology will increase from 10% in 2021 to 80%, which will greatly promote rapid decision-making by enterprises.”

Figure: Database classification diagram

The picture above is the author's simple classification of the main technical tracks of the database. You can also combine it with my previous article [Ying Figure | One article to understand what are the milestones in the development of the database? - Zhihu (zhihu.com) ] Let’s watch together. It is not difficult to find that in the more than 50 years of development history of database technology, its vigorous vitality and the challenges posed by new databases to traditional databases - the driving factors behind this are mainly the urgent need for a new and efficient database in industry and academia. , flexible, high-dimensional architecture emerges to meet the rapid growth of data volume (Volume), the diversity of data types (Volume), the rapid increase in data generation speed (Volume), and people's attention to the value of data (Volume) ——We can therefore understand why GQL has become the only international standard after SQL since 1983. This also strongly illustrates the influence and importance of graph database technology on the future.

Figure: Database engine types ranked by data complexity

Table 1: Analysis of 5 types of mainstream database products

Classification

performance

Scalability

flexibility

Complexity

Key value storage database

high

high

high

none

document database

high

variable

high

Low

Column store database

high

variable

generally

Low

graph database

variable

high

high

high

Relational Database

variable

variable

Low

generally

Graph Database is a NoSQL database implemented based on graph theory. It can store attribute information of entities and relationship information between entities. It has simple modeling, strong performance, rich search functions, and strong scalability.

A graph is composed of vertices and edges connecting each pair of vertices:

Point (node): It is called vertex (Vertex) or point (node), and can also be called entity (Entity).

Edge: An edge connecting two points (node). It is also often called a relationship (relation, relationship) in the category of knowledge graph.

For example, when you look up the relationship between Leonardo da Vinci and the Louvre, you can correlate a very simple diagram between people and objects - the "six degrees of separation theory" originated from the diagram.

Renaissance (representative figure) - Leonardo da Vinci - (representative work) Mona Lisa - (collection) Louvre Museum - Pyramid glass entrance (architect) I.M. Pei - Francois I ( Collection)—Mona Lisa (painting)—Renaissance (influence)]

Another example is the subway, a commuting tool we use almost every day in our daily lives. If a station is regarded as a "point" and two adjacent stations are connected by "edges", then this can also be connected into a typical "graph".

​We can extend infinitely with our own thinking, and by connecting nodes with nodes, we can directly construct attributes and relationships in the real world through graph data at our fingertips [For more reading, see the library | What is a graph ? 】.

There are three types of graph modes, namely attribute graph, hypergraph and triplet, because graph data needs to be stored in a specific graph database before it can be finally implemented into a specific data file, and this process naturally involves What implementation method is used to save graph data. Take Ultipa Graph as an example. Like Nejo4, it is a property graph (Property Graphs) - the property graph model is easier to understand and can describe most graph usage scenarios.

Why are the advantages of graph databases becoming more and more prominent? For example, in traditional relational databases, once multi-table correlation queries are involved, the amount of calculation increases in proportion to the Cartesian product of the amount of data in the table. The larger the amount of data, the more table correlations, the more complex, and the lower the efficiency. Because it searches for matching primary key records in the main table through foreign keys to perform search and matching calculation operations. If a many-to-many relationship is used, an intermediate table must be added to save the foreign key correspondence between the two participating tables. relationship, which further increases the cost of the join operation.

The graph database is very flexible. Not only can it succinctly show the relationship between fabric data through points and edges, but the calculation logic used is the nearest neighbor association calculation (query) mode, which has low computational complexity and exponentially improved efficiency. . See picture below.

Figure: Architectural differences between graph databases and relational databases

For example, if you use a relational database and a graph database to do a deep penetration, from layers 2 to 5, the performance difference actually increases exponentially. For example, when doing layer 1 penetration, there may be no essential difference between the two. Starting from layer 2, there will be exponential (more than 10 times) changes. It can no longer return any results, that is to say, it has exceeded the computing scope of the machine and has stopped. (Interested readers can read in detail: The difference between graph databases and relational databases⁴ ).

Judging from the current market share of major types of databases, relational databases are still the mainstream, but this is in the context of the past when there was a lack of alternatives. With more and more scenarios where it cannot hold, graph databases are Its natural genetic advantages will become a weapon for overtaking in corners.

Table 2: Comparison of mainstream graph databases

Nejo4j

JanusGraph

Ultipa Graph

Reputation

Highest

high

generally

Open source ecosystem

The community version is open source, but has more restrictions; the commercial version is closed source:

Open source; compatible with Apache Tinkerpop ecosystem, cloud services mainly provided by AWS and IBM

Closed source cloud services are mainly provided by Ultipa Cloud

graph query language

Cypher

Gremlin

UQL

Support data scale

The community version is rated at one billion levels; the enterprise version is rated at over 100 billion levels

Ten billion level or above

More than 100 billion level

Large-scale data writing performance

Online import is slow

slower

Fast online import

Large-scale data query performance

Fast and more stable

faster

Fast and super stable

Functional perfection

Complete

Complete

Complete

Data import tool

Support CSV online import; support rich formats

No support provided

Ultipa Transporter supports running on all platforms, supports a variety of formats, and provides data import capabilities for files such as TSV, CSV, Mysql, BigQuery and CSV export capabilities.

Visual interface

Support, rich functions, support visual data modeling, import, analysis, etc.

Not supported, the user needs to integrate a third-party interface

Support, rich in functions, support 2D and 3D conversion; support visual data modeling, import, analysis, etc.

Built-in commonly used graph algorithms

Provides installation algorithm package, providing a wealth of basic graph algorithms

not support

It provides installation algorithm packages and has a rich algorithm library, which can be provided to users in the form of independent algorithm packages.

Basic functions (add, delete, check and modify attribute graphs, plan maintenance, metadata, transactions, caching, query optimization, incremental update of graphs, etc.)

support

support

support

ACID transactions

support

Partially supported, depending on backend storage.

support

chema constraints

Commercial version support, also supports Schema-Free

Supported, and also supports Schema-Free

Supported, and also supports Schema-Free

Graph storage type

Support local storage, support distributed storage, support cloud managed storage

Fly local storage, support distributed storage

graph partition

support

support

support

High availability HA

Business version support

No support provided

support

As we know from the above, the reason why NoSQL databases have become popular is that they can solve the challenges of most data types, large-scale data collections, etc., but what are the differences between them (simply talk about key-value pairs and documents) What about comparison?

Document storage is a hierarchical structure, and data can be easily stored as a tree structure. However, because of this, it can only express a top-to-bottom subordinate relationship, and tree shape is just one of them in graph databases. The performance is richer. In addition, the tree storage structure will have redundant data embedded multiple times, which will increase the difficulty of updating data and fail to ensure data consistency.

Key-value database is more suitable for applications with a small amount of data relationships, because it is organized, indexed and stored in the form of key-value pairs. When the amount of data is small, it can effectively reduce the number of reads and writes to the disk and has high performance. But on the contrary, once the amount of data is large, the obvious graph can better express the complex relationship between the data.

Finally, it’s important to note that not all graph databases are created equal! Some graph databases only have storage capabilities but lack computing capabilities, while others can perform calculations, but are very inefficient when it comes to data migration. There are also some graph databases implemented using NoSQL or MapReduce architectures, but they have not fully and deeply optimized the characteristics of graph computing. The final effect is that the more horizontal the distribution, the lower the efficiency. Some manufacturers blindly move all data into the memory, resulting in a sudden increase in memory usage, which also creates the negative problem of frequent OOM and resulting in downtime. The correct implementation path is "distribution + storage and computing integration + multi-level storage optimization + graph query depth optimization". There are many knowledge points and challenges involved in graph databases, about how to design and implement a truly high-performance distributed graph. Database, interested readers can refer to How to implement a high-concurrency graph database system? ³. [Text/Nezha Emma]

Guess you like

Origin blog.csdn.net/Ultipa/article/details/132584678