[ClickHouse source code] Writing process of materialized view

This article gives a detailed description of the source code of the writing process of the ClickHouse materialized view, based on the v22.8.14.53-lts version.

StorageMaterializedView

First look at the constructor of the materialized view:

StorageMaterializedView::StorageMaterializedView(
    const StorageID & table_id_,
    ContextPtr local_context,
    const ASTCreateQuery & query,
    const ColumnsDescription & columns_,
    bool attach_,
    const String & comment)
    : IStorage(table_id_), WithMutableContext(local_context->getGlobalContext())
{
    StorageInMemoryMetadata storage_metadata;
    storage_metadata.setColumns(columns_);

    ......

    if (!has_inner_table)
    {
        target_table_id = query.to_table_id;
    }
    else if (attach_)
    {
        /// If there is an ATTACH request, then the internal table must already be created.
        target_table_id = StorageID(getStorageID().database_name, generateInnerTableName(getStorageID()), query.to_inner_uuid);
    }
    else
    {
        /// We will create a query to create an internal table.
        auto create_context = Context::createCopy(local_context);
        auto manual_create_query = std::make_shared<ASTCreateQuery>();
        manual_create_query->setDatabase(getStorageID().database_name);
        manual_create_query->setTable(generateInnerTableName(getStorageID()));
        manual_create_query->uuid = query.to_inner_uuid;

        auto new_columns_list = std::make_shared<ASTColumns>();
        new_columns_list->set(new_columns_list->columns, query.columns_list->columns->ptr());

        manual_create_query->set(manual_create_query->columns_list, new_columns_list);
        manual_create_query->set(manual_create_query->storage, query.storage->ptr());

        InterpreterCreateQuery create_interpreter(manual_create_query, create_context);
        create_interpreter.setInternal(true);
        create_interpreter.execute();

        target_table_id = DatabaseCatalog::instance().getTable({manual_create_query->getDatabase(), manual_create_query->getTable()}, getContext())->getStorageID();
    }
}

Through the above code, we can find that the materialized view supports several creation syntaxes, which can be classified into three categories in general:

When the destination table is specified:

create table src(id Int32) Engine=Memory();
create table dest(id Int32) Engine=Memory();

create materialized view mv to dest as select * from src;

When using the above form, target_table_idthe dest table will be selected table_id.

When the destination table is not specified:
```
create table src(id Int32) Engine=Memory();

create materialized view mv Engine=Memory() as select * from src;
```
table_idWhen using the above form, first a .inner.destination table name starting with will be generated according to the source table , for example .inner.5ef4ec2c-efb1-4918-bf6c-59de2edb54cf, and then a random as uuidthe destination table will be generated table_idand used as the same time target_table_id.
The third type is not to create syntax, but to execute the implementation of attach after ClickHouse starts or the materialized view is detach.

StorageMaterializedView::read

void StorageMaterializedView::read(
    QueryPlan & query_plan,
    const Names & column_names,
    const StorageSnapshotPtr & storage_snapshot,
    SelectQueryInfo & query_info,
    ContextPtr local_context,
    QueryProcessingStage::Enum processed_stage,
    const size_t max_block_size,
    const size_t num_streams)
{
    /// 获取目的表实例
    auto storage = getTargetTable();
    auto lock = storage->lockForShare(local_context->getCurrentQueryId(), local_context->getSettingsRef().lock_acquire_timeout);
    auto target_metadata_snapshot = storage->getInMemoryMetadataPtr();
    auto target_storage_snapshot = storage->getStorageSnapshot(target_metadata_snapshot, local_context);

    if (query_info.order_optimizer)
        query_info.input_order_info = query_info.order_optimizer->getInputOrder(target_metadata_snapshot, local_context);

    storage->read(query_plan, column_names, target_storage_snapshot, query_info, local_context, processed_stage, max_block_size, num_streams);

    if (query_plan.isInitialized())
    {
        /// 获取物化视图 stream 中对应的 block 结构
        auto mv_header = getHeaderForProcessingStage(column_names, storage_snapshot, query_info, local_context, processed_stage);
        /// 获取查询语句中所需的列对应的 block 结构
        auto target_header = query_plan.getCurrentDataStream().header;

        /// 从查询的列中去除那些mv不存在的列
        removeNonCommonColumns(mv_header, target_header);

        /// 分布式表引擎在查询处理到指定阶段，header 中可能不包含物化视图中的所有列，例如 group by
        /// 所以从 mv_header 中去除那些查询不需要的列
        removeNonCommonColumns(target_header, mv_header);

        /// 当查询中得到的 mv_header 和 target_header 有不同结构时，会通过在 pipeline 中添加表达式计算来进行转换
        /// 比如 Decimal(38, 6) -> Decimal(16, 6)，或者一些聚合运算，如 sum 等
        if (!blocksHaveEqualStructure(mv_header, target_header))
        {
            auto converting_actions = ActionsDAG::makeConvertingActions(target_header.getColumnsWithTypeAndName(),
                                                                        mv_header.getColumnsWithTypeAndName(),
                                                                        ActionsDAG::MatchColumnsMode::Name);
            auto converting_step = std::make_unique<ExpressionStep>(query_plan.getCurrentDataStream(), converting_actions);
            converting_step->setStepDescription("Convert target table structure to MaterializedView structure");
            query_plan.addStep(std::move(converting_step));
        }

        query_plan.addStorageHolder(storage);
        query_plan.addTableLock(std::move(lock));
    }
}

It can be seen from the above code that the materialized view is a logical description, the data is stored in the destination table, the destination table that is actually operated when reading, and the conversion of multi-stage blocks is also involved in the query process, and Expression evaluation.

StorageMaterializedView::write

SinkToStoragePtr StorageMaterializedView::write(const ASTPtr & query, const StorageMetadataPtr & /*metadata_snapshot*/, ContextPtr local_context)
{
    auto storage = getTargetTable();
    auto lock = storage->lockForShare(local_context->getCurrentQueryId(), local_context->getSettingsRef().lock_acquire_timeout);

    auto metadata_snapshot = storage->getInMemoryMetadataPtr();
    auto sink = storage->write(query, metadata_snapshot, local_context);

    sink->addTableLock(lock);
    return sink;
}

The same write also stores the data in the destination table.

We all know that when data is written to the source table, the materialized view will be triggered to write the data to the destination table. Let's take a look at how it is implemented. The execution of SQL is through IInterpreterto InterpreterXxx, so I won’t say more here, a write operation will be called at the end InterpreterInsertQuery, so InterpreterInsertQuery::execute()start tracking from .

InterpreterInsertQuery::execute()

BlockIO InterpreterInsertQuery::execute()
{
    ......
    std::vector<Chain> out_chains;
    if (!distributed_pipeline || query.watch)
    {
        size_t out_streams_size = 1;
        ......
        for (size_t i = 0; i < out_streams_size; ++i)
        {
            auto out = buildChainImpl(table, metadata_snapshot, query_sample_block, nullptr, nullptr);
            out_chains.emplace_back(std::move(out));
        }
    }
    ......
}

execute()buildChainImpl()To build the output chain through in , buildChainImpl()it will judge whether the current table is associated with a materialized view, and if so, it will be called buildPushingToViewsChain().

buildPushingToViewsChain()

This method is very long, and only the part related to the problem I want to explain in this article is shown here.

Chain buildPushingToViewsChain(
    const StoragePtr & storage,
    const StorageMetadataPtr & metadata_snapshot,
    ContextPtr context,
    const ASTPtr & query_ptr,
    bool no_destination,
    ThreadStatusesHolderPtr thread_status_holder,
    std::atomic_uint64_t * elapsed_counter_ms,
    const Block & live_view_header)
{
    ......
    
    auto table_id = storage->getStorageID();
    auto views = DatabaseCatalog::instance().getDependentViews(table_id);

    ......

    std::vector<Chain> chains;

    for (const auto & view_id : views)
    {
        auto view = DatabaseCatalog::instance().tryGetTable(view_id, context);
        
        ......

        if (auto * materialized_view = dynamic_cast<StorageMaterializedView *>(view.get()))
        {
            ......
            
            StoragePtr inner_table = materialized_view->getTargetTable();
            auto inner_table_id = inner_table->getStorageID();
            auto inner_metadata_snapshot = inner_table->getInMemoryMetadataPtr();
            query = view_metadata_snapshot->getSelectQuery().inner_query;
            target_name = inner_table_id.getFullTableName();

            Block header;

            /// Get list of columns we get from select query.
            if (select_context->getSettingsRef().allow_experimental_analyzer)
                header = InterpreterSelectQueryAnalyzer::getSampleBlock(query, select_context);
            else
                header = InterpreterSelectQuery(query, select_context, SelectQueryOptions().analyze()).getSampleBlock();

            /// Insert only columns returned by select.
            Names insert_columns;
            const auto & inner_table_columns = inner_metadata_snapshot->getColumns();
            for (const auto & column : header)
            {
                /// But skip columns which storage doesn't have.
                if (inner_table_columns.hasPhysical(column.name))
                    insert_columns.emplace_back(column.name);
            }

            InterpreterInsertQuery interpreter(nullptr, insert_context, false, false, false);
            out = interpreter.buildChain(inner_table, inner_metadata_snapshot, insert_columns, thread_status_holder, view_counter_ms);
            out.addStorageHolder(view);
            out.addStorageHolder(inner_table);
        }
        else if (auto * live_view = dynamic_cast<StorageLiveView *>(view.get()))
        {
            runtime_stats->type = QueryViewsLogElement::ViewType::LIVE;
            query = live_view->getInnerQuery(); // Used only to log in system.query_views_log
            out = buildPushingToViewsChain(
                view, view_metadata_snapshot, insert_context, ASTPtr(), true, thread_status_holder, view_counter_ms, storage_header);
        }
        else if (auto * window_view = dynamic_cast<StorageWindowView *>(view.get()))
        {
            runtime_stats->type = QueryViewsLogElement::ViewType::WINDOW;
            query = window_view->getMergeableQuery(); // Used only to log in system.query_views_log
            out = buildPushingToViewsChain(
                view, view_metadata_snapshot, insert_context, ASTPtr(), true, thread_status_holder, view_counter_ms);
        }
        else
            out = buildPushingToViewsChain(
                view, view_metadata_snapshot, insert_context, ASTPtr(), false, thread_status_holder, view_counter_ms);

        ......
}

buildPushingToViewsChain()It will check whether the current table has view dependencies. Through several judgments, it can be seen that there are three types of views: materialized view, real-time view and window view. The last else means that the current table is an ordinary table. If the current table is the source table and has a materialized view dependency, it will be called buildPushingToViewsChain()to build the chain. This is a recursive call. The first time you enter the current table is an ordinary table, the materialized view it depends on will call this method again, and if you enter it again, it will materialize the view. The if logic is ultimately used buildChain()to build the chain.

buildChainImpl

buildChain()buildChainImpl()In this implementation class is called .

Chain InterpreterInsertQuery::buildChainImpl(
    const StoragePtr & table,
    const StorageMetadataPtr & metadata_snapshot,
    const Block & query_sample_block,
    ThreadStatusesHolderPtr thread_status_holder,
    std::atomic_uint64_t * elapsed_counter_ms)
{
    ......
    /// We create a pipeline of several streams, into which we will write data.
    Chain out;

    /// Keep a reference to the context to make sure it stays alive until the chain is executed and destroyed
    out.addInterpreterContext(context_ptr);

    /// NOTE: we explicitly ignore bound materialized views when inserting into Kafka Storage.
    ///       Otherwise we'll get duplicates when MV reads same rows again from Kafka.
    if (table->noPushingToViews() && !no_destination)
    {
        auto sink = table->write(query_ptr, metadata_snapshot, context_ptr);
        sink->setRuntimeData(thread_status, elapsed_counter_ms);
        out.addSource(std::move(sink));
    }
    else
    {
        out = buildPushingToViewsChain(table, metadata_snapshot, context_ptr, query_ptr, no_destination, thread_status_holder, elapsed_counter_ms);
    }

    ......
}

buildChainImpl()Different operations will be performed according to whether the current table (or view) has dependent views or target tables. Here, the situation of view cascading views can be handled, and corresponding chain nodes will be recursively constructed to connect them.

Chain InterpreterInsertQuery::buildChainImpl(
    const StoragePtr & table,
    const StorageMetadataPtr & metadata_snapshot,
    const Block & query_sample_block,
    ThreadStatusesHolderPtr thread_status_holder,
    std::atomic_uint64_t * elapsed_counter_ms)
{
    
    
    ...

    /// We create a pipeline of several streams, into which we will write data.
    Chain out;

    /// Keep a reference to the context to make sure it stays alive until the chain is executed and destroyed
    out.addInterpreterContext(context_ptr);

    /// NOTE: we explicitly ignore bound materialized views when inserting into Kafka Storage.
    ///       Otherwise we'll get duplicates when MV reads same rows again from Kafka.
    if (table->noPushingToViews() && !no_destination)  // table->noPushingToViews() 用于禁止物化视图插入数据到 KafkaEngine
    {
    
    
        auto sink = table->write(query_ptr, metadata_snapshot, context_ptr);
        sink->setRuntimeData(thread_status, elapsed_counter_ms);
        out.addSource(std::move(sink));
    }
    else  // 构建物化视图插入 pushingToViewChain，重点！！！
    {
    
    
        out = buildPushingToViewsChain(table, metadata_snapshot, context_ptr, query_ptr, no_destination, thread_status_holder, elapsed_counter_ms);
    }

    ...

    return out;
}

summary

Therefore, when the source table and materialized view are written, multiple output chains are constructed, and the data can only be operated on the currently written data, without affecting the existing data in the source table. Moreover, the process of writing to the source table and the destination table is a pipeline, which needs to be completed before the writing is successful. Of course, the pipeline can be processed in parallel, which can speed up the writing speed.

Welcome to add WeChat: xiedeyantu to discuss technical issues.