Nebula Graph Source Code Interpretation Series| Vol.02 Detailed Validator

Nebula Graph Source Code Interpretation Series| Vol.02 Detailed Validator

Overall structure  

Nebula Graph Query Engine is mainly divided into four modules, namely Parser, Validator, Optimizer and Executor.  

The Parser completes the lexical parsing of the statement and generates an abstract syntax tree (AST), the Validator converts the AST into an execution plan, the Optimizer optimizes the execution plan, and the Executor is responsible for the calculation of the actual data.

In this article, we mainly introduce the implementation principle of Validator.

Directory Structure

The Validator code is implemented in the src/validatorand src/plannerdirectories.

src/validatorThe catalog mainly includes Validator implementations of various clauses, such as OrderByValidator, LimitValidator, , GoValidatorand so on.

validator/
├── ACLValidator.h
├── AdminJobValidator.h
├── AdminValidator.h
├── AssignmentValidator.h
├── BalanceValidator.h
├── DownloadValidator.h
├── ExplainValidator.h
├── FetchEdgesValidator.h
├── FetchVerticesValidator.h
├── FindPathValidator.h
├── GetSubgraphValidator.h
├── GoValidator.h
├── GroupByValidator.h
├── IngestValidator.h
├── LimitValidator.h
├── LookupValidator.h
├── MaintainValidator.h
├── MatchValidator.h
├── MutateValidator.h
├── OrderByValidator.h
├── PipeValidator.h
├── ReportError.h
├── SequentialValidator.h
├── SetValidator.h
├── TraversalValidator.h
├── UseValidator.h
├── Validator.h
└── YieldValidator.h 

The src/planner/plan directory defines all PlanNode data structures used to generate the final execution plan . For example, when a query statement contains an aggregate function, an Aggregate node will be generated in the execution plan. The Aggregate class will specify all the information required for aggregate function calculation, including grouping columns and aggregate function expressions. The Aggregate class is defined in Query.h . Nebula defines more than one hundred PlanNodes, which PlanNode::kindare defined in PlanNode.h, and will not be elaborated here.

planner/plan/
├── Admin.cpp          
├── Admin.h             // administration related  nodes
├── Algo.cpp
├── Algo.h              // graph algorithm related nodes
├── ExecutionPlan.cpp
├── ExecutionPlan.h     // explain and profile nodes
├── Logic.cpp
├── Logic.h             // nodes introduced by the implementation layer
├── Maintain.cpp
├── Maintain.h          // schema related nodes
├── Mutate.cpp
├── Mutate.h            // DML related nodes
├── PlanNode.cpp
├── PlanNode.h          // plan node base classes
├── Query.cpp
├── Query.h             // DQL related nodes
└── Scan.h              // index related nodes

The src/planner directory also defines planner implementations for nGQL and match statements, which are used to generate nGQL and match statement execution plans.

Source code analysis

The validator entry function is Validator::validate(Sentence*, QueryContext*)responsible for converting the abstract syntax tree generated by the parser into an execution plan. The QueryContext will save the final generated execution plan root node. The function code is as follows:

Status Validator::validate(Sentence* sentence, QueryContext* qctx) {
    DCHECK(sentence != nullptr);
    DCHECK(qctx != nullptr);

    // Check if space chosen from session. if chosen, add it to context.
    auto session = qctx->rctx()->session();
    if (session->space().id > kInvalidSpaceID) {
        auto spaceInfo = session->space();
        qctx->vctx()->switchToSpace(std::move(spaceInfo));
    }

    auto validator = makeValidator(sentence, qctx);
    NG_RETURN_IF_ERROR(validator->validate());

    auto root = validator->root();
    if (!root) {
        return Status::SemanticError("Get null plan from sequential validator");
    }
    qctx->plan()->setRoot(root);
    return Status::OK();
} 

The function first obtains the space information of the current session and saves it in the ValidateContext, and then calls the Validator::makeValidator()and Validator::validate()functions.

Validator::makeValidator()The function is to generate the validator of the clause, the function will first generate SequentialValidator, SequentialValidator is the entry of validator, all statements will first generate SequentialValidator.

SequentialValidator::validateImpl()The function calls the validator that Validator::makeValidator()generates the corresponding clause. The function code is as follows:

Status SequentialValidator::validateImpl() {
    Status status;
    if (sentence_->kind() != Sentence::Kind::kSequential) {
        return Status::SemanticError(
                "Sequential validator validates a SequentialSentences, but %ld is given.",
                static_cast<int64_t>(sentence_->kind()));
    }
    auto seqSentence = static_cast<SequentialSentences*>(sentence_);
    auto sentences = seqSentence->sentences();

    seqAstCtx_->startNode = StartNode::make(seqAstCtx_->qctx);
    for (auto* sentence : sentences) {
        auto validator = makeValidator(sentence, qctx_);
        NG_RETURN_IF_ERROR(validator->validate());
        seqAstCtx_->validators.emplace_back(std::move(validator));
    }

    return Status::OK();
}

Likewise, PipeValidator, AssignmentValidator, and SetValidator generate validators for the corresponding clauses.

Validator::validate()Responsible for generating the execution plan, the function code is as follows:

Status Validator::validate() {
    auto vidType = space_.spaceDesc.vid_type_ref().value().type_ref().value();
    vidType_ = SchemaUtil::propTypeToValueType(vidType);

    NG_RETURN_IF_ERROR(validateImpl());

    // Check for duplicate reference column names in pipe or var statement
    NG_RETURN_IF_ERROR(checkDuplicateColName());

    // Execute after validateImpl because need field from it
    if (FLAGS_enable_authorize) {
        NG_RETURN_IF_ERROR(checkPermission());
    }

    NG_RETURN_IF_ERROR(toPlan());

    return Status::OK();
}

The function first checks information such as space and user authority, and then calls the function Validator:validateImpl()to complete the clause verification. validateImpl()The function is a pure virtual function of the Validator class, and uses polymorphism to call the validatorImpl()implementation . Finally, call the Validator::toPlan()function to generate the execution plan, the toPlan()function will generate the execution plan of the clause, and the sub-execution plans will be MatchPlanner::connectSegments()connected Validator::appendPlan().

Example

Let's take the nGQL statement as an example to introduce the above process in detail.

Statement:

GO 3 STEPS FROM "vid" OVER edge 
WHERE $$.tag.prop > 30 
YIELD edge._dst AS dst 
| ORDER BY $-.dst

This nGQL statement mainly goes through three processes in the validator phase:

make clause validator

First, Validator::makeValidator()generate . PipeValidator will be generated in the SequentialValidator::validateImpl()function , PipeValidator will make validators for left and right clauses, namely GoValidator and OrderByValidator.

clause check

The clause validation phase checks the Go and OrderBy clauses respectively.

Taking the Go statement as an example, it will first check for semantic errors, such as improper use of the aggregate function, mismatch of expression types, etc., and then check the internal clauses in turn. During the verification process, the intermediate results of the verification will be saved in the GoContext. , as the basis for GoPlanner to generate the execution plan. For example, validateWhere() will save the filter condition expression and use it to generate the Filter execution plan node later.

    NG_RETURN_IF_ERROR(validateStep(goSentence->stepClause(), goCtx_->steps));  // 校验 step 子句
    NG_RETURN_IF_ERROR(validateStarts(goSentence->fromClause(), goCtx_->from)); // 校验 from 子句
    NG_RETURN_IF_ERROR(validateOver(goSentence->overClause(), goCtx_->over));   // 校验 over 子句
    NG_RETURN_IF_ERROR(validateWhere(goSentence->whereClause()));               // 校验 where 子句
    NG_RETURN_IF_ERROR(validateYield(goSentence->yieldClause()));               // 校验 yield 子句

plan generation

  The sub-execution plan of the Go statement is generated by the GoPlanner::transform(Astcontext*) function, and the code is as follows:

StatusOr<SubPlan> GoPlanner::transform(AstContext* astCtx) {
    goCtx_ = static_cast<GoContext *>(astCtx);
    auto qctx = goCtx_->qctx;
    goCtx_->joinInput = goCtx_->from.fromType != FromType::kInstantExpr;
    goCtx_->joinDst = !goCtx_->exprProps.dstTagProps().empty();

    SubPlan startPlan = QueryUtil::buildStart(qctx, goCtx_->from, goCtx_->vidsVar);

    auto& steps = goCtx_->steps;
    if (steps.isMToN()) {
        return mToNStepsPlan(startPlan);
    }

    if (steps.steps() == 0) {
        auto* pt = PassThroughNode::make(qctx, nullptr);
        pt->setColNames(std::move(goCtx_->colNames));
        SubPlan subPlan;
        subPlan.root = subPlan.tail = pt;
        return subPlan;
    }

    if (steps.steps() == 1) {
        return oneStepPlan(startPlan);
    }
    return nStepsPlan(startPlan);
}

The function first calls QueryUtil::buildStart() to construct the start node, and then generates the plan in different ways according to the four different steps. The statement in this example uses the nStepPlan strategy.

The code of the GoPlanner::nStepsPlan() function is as follows:

SubPlan GoPlanner::nStepsPlan(SubPlan& startVidPlan) {
    auto qctx = goCtx_->qctx;

    auto* start = StartNode::make(qctx);
    auto* gn = GetNeighbors::make(qctx, start, goCtx_->space.id);
    gn->setSrc(goCtx_->from.src);
    gn->setEdgeProps(buildEdgeProps(true));
    gn->setInputVar(goCtx_->vidsVar);

    auto* getDst = QueryUtil::extractDstFromGN(qctx, gn, goCtx_->vidsVar);

    PlanNode* loopBody = getDst;
    PlanNode* loopDep = nullptr;
    if (goCtx_->joinInput) {
        auto* joinLeft = extractVidFromRuntimeInput(startVidPlan.root);
        auto* joinRight = extractSrcDstFromGN(getDst, gn->outputVar());
        loopBody = trackStartVid(joinLeft, joinRight);
        loopDep = joinLeft;
    }

    auto* condition = loopCondition(goCtx_->steps.steps() - 1, gn->outputVar());
    auto* loop = Loop::make(qctx, loopDep, loopBody, condition);

    auto* root = lastStep(loop, loopBody == getDst ? nullptr : loopBody);
    SubPlan subPlan;
    subPlan.root = root;
    subPlan.tail = startVidPlan.tail == nullptr ? loop : startVidPlan.tail;

    return subPlan;
}

The sub-execution plan generated by the Go statement is as follows:

Start -> GetNeighbors -> Project -> Dedup -> Loop -> GetNeighbors -> Project -> GetVertices -> Project -> LeftJoin -> Filter -> Project

The function of the Go statement is to complete the expansion of the graph. GetNeighbors is the most important node in the execution plan. The GetNeighbors operator will access the storage service during runtime, and get the id of the end point after one-step expansion through the starting point and the specified edge type. The loop node is implemented, and the loop sub-plan is between Start and Loop. When the conditions are met, the Loop sub-plan will be executed cyclically, and the last step of the expansion node is implemented outside the Loop. The Project node is used to obtain the end point id of the current expansion, and the Dedup node deduplicates the end point id as the starting point of the next expansion. The GetVertices node is responsible for fetching the attributes of the endpoint tag, Filter is used for conditional filtering, and the function of LeftJoin is to combine the results of GetNeightbors and GetVertices.

The function of the OrderBy statement is to sort the data, and the child execution plan will generate a Sort node.

After the left and right clause plans are generated, the PipeValidator::toPlan() function will call Validator::appendPlan() to connect the left and right subplans and get the final execution plan. The complete execution plan is as follows:

Start -> GetNeighbors -> Project -> Dedup -> Loop -> GetNeighbors -> Project -> GetVertices -> Project -> LeftJoin -> Filter -> Project -> Sort -> DataCollect 

The above Validator part is introduced.

Forum related questions

Q: How to find the parser/GraphParser.hpp file

A: The .h file is a file generated during compilation, and there is a file once compiled.

The above is the introduction of this article.

Exchange graph database technology? To join the Nebula exchange group, please fill in your Nebula business card first , and the Nebula assistant will pull you into the group~~

{{o.name}}
{{m.name}}

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324137594&siteId=291194637