Nebula Graph Source Code Interpretation Series| Implementation of Variable Length Pattern in Vol.06 MATCH

Nebula Graph Source Code Interpretation Series| Implementation of Variable Length Pattern in Vol.06 MATCH

content

  • problem analysis
    • Fixed length Pattern
    • Combination of Variable Length Pattern and Variable Length Pattern
  • Implementation plan
    • one step further
    • Expand in multiple steps
    • save route
    • Variable length splicing
  • Summarize

As the core of the openCypher language, MATCH allows users to express the association relationship in the gallery conveniently through the concise Pattern form. Variable-length mode is a common form used to describe paths in Pattern, and the support for variable-length mode is the first step in Nebula's compatibility with openCypher MATCH function.

As can be seen from the previous series of articles, the execution plan of Nebula is composed of many physical operators, each operator is responsible for executing unique calculation logic, and the implementation of MATCH will also involve these operators in the previous articles, Such as GetNeighbors, GetVertices, Join, Project, Filter, Loop, etc. Because Nebula's execution plan is different from the tree structure in relational databases, the execution process is actually a cyclic graph. How to turn the variable-length pattern in MATCH into the physical plan of Nebula is the focus of the problem to be solved by Planner. The following will briefly introduce the idea of ​​solving the variable-length Pattern problem in Nebula.

problem analysis

Fixed length Pattern

When using the MATCH statement, the fixed-length Pattern is also a more commonly used query form. If a fixed-length pattern is understood as a variable-length pattern that extends X steps outward, and it is considered to be a special case of the latter, then the implementation of fixed-length and variable-length patterns can be unified, as shown below:

// 定长 Pattern MATCH (v)-[e]-(v2)
// 变长 Pattern MATCH (v)-[e*1..1]-(v2)

The difference in the above example is the type of the variable e. When fixed length e represents an edge, when variable length e represents an edge list of length 1.

Combination of Variable Length Pattern and Variable Length Pattern

In openCypher's MATCH syntax, Patterns can be flexibly combined to express complex paths. As shown below, a variable-length pattern is followed by a variable-length pattern:

MATCH (v)-[e*1..3]-(v2)-[ee*2..4]-(v3)

The above process can be a continuously extending process, and very complex paths can be combined through different arrangements of variable-length and fixed-length modes. So we have to find a pattern to generate the plan to easily recursively iterate the whole process. The following factors need to be considered:

  1. The path of the later variable-length Pattern depends on all the previous variable-length paths;
  2. All symbols (or variables) following the variable-length Pattern represent the result of "change";
  3. Each step needs to deduplicate the starting point before expanding outward;

We can notice that if we can generate the execution plan of ()-[:like*m..n]-the , then the combination iterations will become traceable, as shown below:

()-[:like*m..n]- ()-[:like*k..l]- ()
 \____________/   \____________/   \_/
    Pattern1         Pattern2       Pattern3

Implementation plan

Let's analyze ()-[:like*m..n]-the see how it translates into Nebula's physical execution plan. The above mode description means to expand m to n steps outwards. In Nebula, the outward expansion is done by the GetNeighbors operator. If you want to expand out for multiple steps, you need to call the GetNeighbors operator on the basis of the previous expansion, and the end-to-end connection of the point and edge data obtained each time will be spliced ​​into a path. Although the last thing the user needs is the path from m to n steps, it still needs to expand from step 1 to step n during the execution process. And the path results in each step of the expansion process need to be saved for output or use in the next step. Finally, just take out the path whose length is between m and n steps.

one step further

Let's first take a look at what the plan of going one step looks like. Because the way of Nebula data storage is that the starting point and the outgoing edge are placed together, it is not necessary to cross partitions to obtain the data of the starting point and the outgoing edge. However, the end point data of the edge is generally across partitions, and the attributes of the points need to be obtained through the GetVertices interface separately. In addition, before expanding outward, it is best to deduplicate the expanded starting point data to avoid repeated scanning of storage. So the step-by-step execution plan is shown in the following figure:

one step further

Expand in multiple steps

The process of expanding multi-step is actually repeating the above process, but we will notice that GetNeighbors can obtain the properties of the starting point, so when expanding the next step, one step of GetVertices can be omitted. The two-step expansion plan becomes:

one step further

save route

Since it may be necessary to return the path of each expansion step at the end, in the above expansion process, all paths need to be saved. Joining paths between two steps can be done with the join operator. At the same time, because e ()-[e:like*m..n]-in the represents a column of data (the list of edges), the result set must be merged by union for each expansion path above. The execution plan further evolved into:

one step further

Variable length splicing

The physical plan of the pattern can be generated from the above process  ()-[e:like*m..n]-. When multiple similar patterns are spliced, the above process is iterated. However, before the pattern iteration, the results of the above plan need to be filtered, because we expect to get the results of steps m to n. The above data set contains all the results from steps 1 to n. The length of the path can be simply filtered. The plan after variable-length mode splicing becomes:

one step further

Through the above step-by-step decomposition, we finally got the execution plan expected by the original MATCH statement. It can be seen that it is still a lot of effort to convert a complex schema into the underlying extension interface. Of course, the above plan can be optimized, such as encapsulating the multi-step expansion process with the Loop operator, and reusing the one-step expansion sub-plan, which will not be detailed here. Interested users can refer to the nebula source code implementation .

Summarize

The above process demonstrates the execution plan generation process of a variable-length Pattern MATCH statement. I believe everyone will have such a doubt at this time. Why do some basic path extensions generate such complex execution plans in Nebula? Compared with the implementation of Neo4j, several operators can complete the same work, which will become a cumbersome DAG here?

The essential reason for this problem is that Nebula's operators are closer to the underlying interface and lack some semantic abstractions of higher-level graph operations. If the operator strength is too fine, it will cause the implementation of the upper layer optimization and other implementations to consider too many details. The execution operators will be further sorted out later to gradually improve the MATCH function and improve performance.

"Complete Guide to Open Source Distributed Graph Database Nebula Graph", also known as: Nebula Book, which records the knowledge points and specific usage of graph database and graph database Nebula Graph in detail, read the portal: https://docs.nebula- graph.com.cn/site/pdf/NebulaGraph-book.pdf

Exchange graph database technology? To join the Nebula exchange group, please fill in your Nebula business card first , and the Nebula assistant will pull you into the group~~

{{o.name}}
{{m.name}}

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324209694&siteId=291194637