ClickHouse and his friends (4) Pipeline processor and scheduler

Original source: https://bohutang.me/2020/06/11/clickhouse-and-friends-processor/

This article talks about ClickHouse core technology: Processor Processor and DAG Scheduler, a directed acyclic scheduler.

These concepts are not the first of ClickHouse. Interested students can pay attention to the time-dataflow of materialize. Brother Hu also wrote a prototype in golang.

The focus is on implementation details. It is the sophisticated design of these modules that gives the overall high performance of ClickHouse.

Pipeline issues

In a traditional database system, a Query processing flow is roughly:

Among them, in the Plan stage, a pipeline assembly is often added (a transformer represents a data processing):

All transformers are arranged into a pipeline and then handed over to the executor for serial execution. Each time a transformer data set is executed, it will be processed and output, all the way to the downstream sinker. It can be seen that the advantage of this model is simplicity, but the disadvantage is that it has low performance and cannot take advantage of the parallelism of the CPU. It is usually called the volcano- style model , which is sufficient for low latency of OLTP, and is suitable for computationally intensive OLAP. It's far from enough, CPU less than 100% is a crime!

For the above example, if transformer1 and transformer2 have no intersection, then they can be processed in parallel:

This involves some more spiritual issues:

  1. How to achieve flexible orchestration of transformers?

  2. How to achieve data synchronization between transformers?

  3. How to implement parallel scheduling between transformers?

Processor 和 DAY Scheduler

1. Transformer Orchestration

ClickHouse implements a series of basic transformer modules, see src/Processors/Transforms, such as:

  • FilterTransform - WHERE conditional filtering

  • SortingTransform -- ORDER BY 排序

  • LimitByTransform - LIMIT crop

When we execute:

SELECT * FROM t1 WHERE id=1 ORDER BY time DESC LIMIT 10

For ClickHouse's QueryPipeline, it will be arranged and assembled in the following way:

QueryPipeline::addSimpleTransform(Source)
QueryPipeline::addSimpleTransform(FilterTransform)
QueryPipeline::addSimpleTransform(SortingTransform)
QueryPipeline::addSimpleTransform(LimitByTransform)
QueryPipeline::addSimpleTransform(Sinker)

This achieves the arrangement of Transformer, but how to synchronize data during execution?

2. Transformer data synchronization

When QueryPipeline performs transformer orchestration, we also need to build a lower-level DAG connection.

connect(Source.OutPort, FilterTransform.InPort)
connect(FilterTransform.OutPort, SortingTransform.InPort)
connect(SortingTransform.OutPort, LimitByTransform.InPort)
connect(LimitByTransform.OutPort, Sinker.InPort)

In this way, the data flow relationship is realized. The OutPort of one transformer is connected to the InPort of another, just like the water pipe in our reality. The interface has 3 or even multiple channels.

3. Transformer execution scheduling

Now that the pipeline is assembled, how can the water in the pipeline be treated and flowed under pressure?

ClickHouse defines a set of transform states, and the processor implements scheduling according to these states.

    enum class Status
    {
        NeedData  // 等待数据流进入
        PortFull, // 管道流出端阻塞
        Finished, // 完成状态,退出
        Ready,    // 切换到 work 函数,进行逻辑处理
        Async,    // 切换到 schedule 函数,进行异步处理
        Wait,     // 等待异步处理
        ExpandPipeline,      // Pipeline 需要裂变
    };

When the source generates data, its status will be set to PortFull, which means waiting to flow into the InPort of other transformers, the processor will start to schedule the Prepare of FilterTransformer (NeedData) for PullData, and then its status will be set to Ready, waiting for the processor to schedule Work The method performs data Filter processing, so everyone depends on the state to let the processor perceive, to schedule and perform state transitions until the Finished state.

What is worth mentioning here is the ExpandPipeline state, which, according to the implementation of the transformer, can split a transformer into more transformers for parallel execution, achieving an explosive effect.

Example

SELECT number + 1 FROM t1;

In order to have a deeper understanding of ClickHouse's processor and scheduler mechanisms, let's take an original ecological example:

  1. One Source: {0,1,2,3,4}

  2. AdderTransformer adds 1 to each number

  3. A Sinker, output result

1. Source

class MySource : public ISource
{
public:
    String getName() const override { return "MySource"; }

    MySource(UInt64 end_)
        : ISource(Block({ColumnWithTypeAndName{ColumnUInt64::create(), std::make_shared<DataTypeUInt64>(), "number"}})), end(end_)
    {
    }

private:
    UInt64 end;
    bool done = false;

    Chunk generate() override
    {
        if (done)
        {
            return Chunk();
        }
        MutableColumns columns;
        columns.emplace_back(ColumnUInt64::create());
        for (auto i = 0U; i < end; i++)
            columns[0]->insert(i);

        done = true;
        return Chunk(std::move(columns), end);
    }
};

2. MyAddTransform

class MyAddTransformer : public IProcessor
{
public:
    String getName() const override { return "MyAddTransformer"; }

    MyAddTransformer()
        : IProcessor(
            {Block({ColumnWithTypeAndName{ColumnUInt64::create(), std::make_shared<DataTypeUInt64>(), "number"}})},
            {Block({ColumnWithTypeAndName{ColumnUInt64::create(), std::make_shared<DataTypeUInt64>(), "number"}})})
        , input(inputs.front())
        , output(outputs.front())
    {
    }

    Status prepare() override
    {
        if (output.isFinished())
        {
            input.close();
            return Status::Finished;
        }

        if (!output.canPush())
        {
            input.setNotNeeded();
            return Status::PortFull;
        }

        if (has_process_data)
        {
            output.push(std::move(current_chunk));
            has_process_data = false;
        }

        if (input.isFinished())
        {
            output.finish();
            return Status::Finished;
        }

        if (!input.hasData())
        {
            input.setNeeded();
            return Status::NeedData;
        }
        current_chunk = input.pull(false);
        return Status::Ready;
    }

    void work() override
    {
        auto num_rows = current_chunk.getNumRows();
        auto result_columns = current_chunk.cloneEmptyColumns();
        auto columns = current_chunk.detachColumns();
        for (auto i = 0U; i < num_rows; i++)
        {
            auto val = columns[0]->getUInt(i);
            result_columns[0]->insert(val+1);
        }
        current_chunk.setColumns(std::move(result_columns), num_rows);
        has_process_data = true;
    }

    InputPort & getInputPort() { return input; }
    OutputPort & getOutputPort() { return output; }

protected:
    bool has_input = false;
    bool has_process_data = false;
    Chunk current_chunk;
    InputPort & input;
    OutputPort & output;
};

3. MySink

class MySink : public ISink
{
public:
    String getName() const override { return "MySinker"; }

    MySink() : ISink(Block({ColumnWithTypeAndName{ColumnUInt64::create(), std::make_shared<DataTypeUInt64>(), "number"}})) { }

private:
    WriteBufferFromFileDescriptor out{STDOUT_FILENO};
    FormatSettings settings;

    void consume(Chunk chunk) override
    {
        size_t rows = chunk.getNumRows();
        size_t columns = chunk.getNumColumns();

        for (size_t row_num = 0; row_num < rows; ++row_num)
        {
            writeString("prefix-", out);
            for (size_t column_num = 0; column_num < columns; ++column_num)
            {
                if (column_num != 0)
                    writeChar('\t', out);
                getPort()
                    .getHeader()
                    .getByPosition(column_num)
                    .type->serializeAsText(*chunk.getColumns()[column_num], row_num, out, settings);
            }
            writeChar('\n', out);
        }

        out.next();
    }
};

DAY SCHEDULER

int main(int, char **)
{
    auto source0 = std::make_shared<MySource>(5);
    auto add0 = std::make_shared<MyAddTransformer>();
    auto sinker0 = std::make_shared<MySink>();

    /// Connect.
    connect(source0->getPort(), add0->getInputPort());
    connect(add0->getOutputPort(), sinker0->getPort());

    std::vector<ProcessorPtr> processors = {source0, add0, sinker0};
    PipelineExecutor executor(processors);
    executor.execute(1);
}

to sum up

From the developer's point of view, it is still more complicated. The state transition still needs to be controlled by the developer. However, upstream has done a lot of basic work, such as encapsulating ISource for source, encapsulating ISink for sink, and a basic ISimpleTransform for development. It is easier to use the processor at the upper level, and you can build the pipeline you want in building blocks.

The transformer data unit of ClickHouse is Chunk. The transformer processes the Chunk that flows from the upstream OutPort, and then outputs it to the downstream InPort. The graph-connected pipeline works in parallel to make the CPU work as full as possible.

After a SQL is parsed into an AST, ClickHouse constructs a Query Plan based on the AST, and then constructs a pipeline based on the QueryPlan, and finally the processor is responsible for scheduling and execution. At present, the new version of ClickHouse has enabled QueryPipeline by default, and this piece of code is also constantly iterating.

In-text link

Further reading

The full text is over.

Enjoy ClickHouse :)

Teacher Ye's "MySQL Core Optimization" class has been upgraded to MySQL 8.0, scan the code to start the journey of MySQL 8.0 practice

Guess you like

Origin blog.csdn.net/n88Lpo/article/details/109172165