ClickHouse and his friends (2) MySQL Protocol and Read call stack

Original source: https://bohutang.me/2020/06/07/clickhouse-and-friends-mysql-protocol-read-stack/

As an OLAP DBMS, there are two important aspects:

  • How can users come in easily? This is the entry point

    • In addition to its own client, ClickHouse also provides access methods such as MySQL/PG/GRPC/HTTP

  • How to easily hang up the data, this is the data source

    • In addition to its own engine, ClickHouse can also mount external data sources such as MySQL/Kafka

In this way, internal and external communication, multiple friends and multiple paths, in order to achieve "data"-level orchestration capabilities.

Today we are talking about the MySQL protocol on the entry side, and it is also the first good friend of ClickHouse in this series. Users can directly link to ClickHouse through the MySQL client or related Driver to perform operations such as data reading and writing.

This article uses MySQL Query request and borrows the call stack to understand the whole process of ClickHouse data reading.

How to achieve?

The entry file is in: MySQLHandler.cpp

Handshake agreement

  1. MySQLClient sends Greeting data message to MySQLHandler

  2. MySQLHandler replies with a Greeting-Response message

  3. MySQLClient sends authentication message

  4. MySQLHandler authenticates the authentication message and returns the authentication result

MySQL Protocol is implemented in: Core/MySQLProtocol.h

Query request

After the authentication is passed, normal data interaction can be carried out.

  1. When MySQLClient sends a request:

mysql> SELECT * FROM system.numbers LIMIT 5;
  1. The call stack of MySQLHandler:

->MySQLHandler::comQuery -> executeQuery -> pipeline->execute -> MySQLOutputFormat::consume
  1. MySQLClient receives the result

In step 2, executeQuery(executeQuery.cpp) is very important. It is the access port for all front-end Server and ClickHouse kernel. The first parameter is the SQL text ('select 1'), and the second parameter is where the result set should be sent (socket net).

Call stack analysis

SELECT * FROM system.numbers LIMIT 5

1. Get the data source

StorageSystemNumbers data source:

DB::StorageSystemNumbers::read(std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&, std::__1::shared_ptr<DB::StorageInMemoryMetadata const> const&, DB::SelectQueryInfo const&, DB::Context const&, DB::QueryProcessingStage::Enum, unsigned long, unsigned int) StorageSystemNumbers.cpp:135
DB::ReadFromStorageStep::ReadFromStorageStep(std::__1::shared_ptr<DB::RWLockImpl::LockHolderImpl>, std::__1::shared_ptr<DB::StorageInMemoryMetadata const>&, DB::SelectQueryOptions, 
DB::InterpreterSelectQuery::executeFetchColumns(DB::QueryProcessingStage::Enum, DB::QueryPlan&, std::__1::shared_ptr<DB::PrewhereInfo> const&, std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&) memory:3028
DB::InterpreterSelectQuery::executeFetchColumns(DB::QueryProcessingStage::Enum, DB::QueryPlan&, std::__1::shared_ptr<DB::PrewhereInfo> const&, std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&) InterpreterSelectQuery.cpp:1361
DB::InterpreterSelectQuery::executeImpl(DB::QueryPlan&, std::__1::shared_ptr<DB::IBlockInputStream> const&, std::__1::optional<DB::Pipe>) InterpreterSelectQuery.cpp:791
DB::InterpreterSelectQuery::buildQueryPlan(DB::QueryPlan&) InterpreterSelectQuery.cpp:472
DB::InterpreterSelectWithUnionQuery::buildQueryPlan(DB::QueryPlan&) InterpreterSelectWithUnionQuery.cpp:183
DB::InterpreterSelectWithUnionQuery::execute() InterpreterSelectWithUnionQuery.cpp:198
DB::executeQueryImpl(const char *, const char *, DB::Context &, bool, DB::QueryProcessingStage::Enum, bool, DB::ReadBuffer *) executeQuery.cpp:385
DB::executeQuery(DB::ReadBuffer&, DB::WriteBuffer&, bool, DB::Context&, std::__1::function<void (std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, 
DB::MySQLHandler::comQuery(DB::ReadBuffer&) MySQLHandler.cpp:307
DB::MySQLHandler::run() MySQLHandler.cpp:141

The most important thing here is the ReadFromStorageStep function, which gets the data source pipe from different storage:

Pipes pipes = storage->read(required_columns, metadata_snapshot, query_info, *context, processing_stage, max_block_size, max_streams);

2. Pipeline structure

DB::LimitTransform::LimitTransform(DB::Block const&, unsigned long, unsigned long, unsigned long, bool, bool, std::__1::vector<DB::SortColumnDescription, std::__1::allocator<DB::SortColumnDescription> >) LimitTransform.cpp:21
DB::LimitStep::transformPipeline(DB::QueryPipeline&) memory:2214
DB::LimitStep::transformPipeline(DB::QueryPipeline&) memory:2299
DB::LimitStep::transformPipeline(DB::QueryPipeline&) memory:3570
DB::LimitStep::transformPipeline(DB::QueryPipeline&) memory:4400
DB::LimitStep::transformPipeline(DB::QueryPipeline&) LimitStep.cpp:33
DB::ITransformingStep::updatePipeline(std::__1::vector<std::__1::unique_ptr<DB::QueryPipeline, std::__1::default_delete<DB::QueryPipeline> >, std::__1::allocator<std::__1::unique_ptr<DB::QueryPipeline, std::__1::default_delete<DB::QueryPipeline> > > >) ITransformingStep.cpp:21
DB::QueryPlan::buildQueryPipeline() QueryPlan.cpp:154
DB::InterpreterSelectWithUnionQuery::execute() InterpreterSelectWithUnionQuery.cpp:200
DB::executeQueryImpl(const char *, const char *, DB::Context &, bool, DB::QueryProcessingStage::Enum, bool, DB::ReadBuffer *) executeQuery.cpp:385
DB::executeQuery(DB::ReadBuffer&, DB::WriteBuffer&, bool, DB::Context&, std::__1::function<void (std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&)>) executeQuery.cpp:722
DB::MySQLHandler::comQuery(DB::ReadBuffer&) MySQLHandler.cpp:307
DB::MySQLHandler::run() MySQLHandler.cpp:141

3. Pipeline execution

DB::LimitTransform::prepare(std::__1::vector<unsigned long, std::__1::allocator<unsigned long> > const&, std::__1::vector<unsigned long, std::__1::allocator<unsigned long> > const&) LimitTransform.cpp:67
DB::PipelineExecutor::prepareProcessor(unsigned long, unsigned long, std::__1::queue<DB::PipelineExecutor::ExecutionState*, std::__1::deque<DB::PipelineExecutor::ExecutionState*, std::__1::allocator<DB::PipelineExecutor::ExecutionState*> > >&, std::__1::unique_lock<std::__1::mutex>) PipelineExecutor.cpp:291
DB::PipelineExecutor::tryAddProcessorToStackIfUpdated(DB::PipelineExecutor::Edge&, std::__1::queue<DB::PipelineExecutor::ExecutionState*, std::__1::deque<DB::PipelineExecutor::ExecutionState*, std::__1::allocator<DB::PipelineExecutor::ExecutionState*> > >&, unsigned long) PipelineExecutor.cpp:264
DB::PipelineExecutor::prepareProcessor(unsigned long, unsigned long, std::__1::queue<DB::PipelineExecutor::ExecutionState*, std::__1::deque<DB::PipelineExecutor::ExecutionState*, std::__1::allocator<DB::PipelineExecutor::ExecutionState*> > >&, std::__1::unique_lock<std::__1::mutex>) PipelineExecutor.cpp:373
DB::PipelineExecutor::tryAddProcessorToStackIfUpdated(DB::PipelineExecutor::Edge&, std::__1::queue<DB::PipelineExecutor::ExecutionState*, std::__1::deque<DB::PipelineExecutor::ExecutionState*, std::__1::allocator<DB::PipelineExecutor::ExecutionState*> > >&, unsigned long) PipelineExecutor.cpp:264
DB::PipelineExecutor::prepareProcessor(unsigned long, unsigned long, std::__1::queue<DB::PipelineExecutor::ExecutionState*, std::__1::deque<DB::PipelineExecutor::ExecutionState*, std::__1::allocator<DB::PipelineExecutor::ExecutionState*> > >&, std::__1::unique_lock<std::__1::mutex>) PipelineExecutor.cpp:373
DB::PipelineExecutor::tryAddProcessorToStackIfUpdated(DB::PipelineExecutor::Edge&, std::__1::queue<DB::PipelineExecutor::ExecutionState*, std::__1::deque<DB::PipelineExecutor::ExecutionState*, std::__1::allocator<DB::PipelineExecutor::ExecutionState*> > >&, unsigned long) PipelineExecutor.cpp:264
DB::PipelineExecutor::prepareProcessor(unsigned long, unsigned long, std::__1::queue<DB::PipelineExecutor::ExecutionState*, std::__1::deque<DB::PipelineExecutor::ExecutionState*, std::__1::allocator<DB::PipelineExecutor::ExecutionState*> > >&, std::__1::unique_lock<std::__1::mutex>) PipelineExecutor.cpp:373
DB::PipelineExecutor::tryAddProcessorToStackIfUpdated(DB::PipelineExecutor::Edge&, std::__1::queue<DB::PipelineExecutor::ExecutionState*, std::__1::deque<DB::PipelineExecutor::ExecutionState*, std::__1::allocator<DB::PipelineExecutor::ExecutionState*> > >&, unsigned long) PipelineExecutor.cpp:264
DB::PipelineExecutor::prepareProcessor(unsigned long, unsigned long, std::__1::queue<DB::PipelineExecutor::ExecutionState*, std::__1::deque<DB::PipelineExecutor::ExecutionState*, std::__1::allocator<DB::PipelineExecutor::ExecutionState*> > >&, std::__1::unique_lock<std::__1::mutex>) PipelineExecutor.cpp:373
DB::PipelineExecutor::tryAddProcessorToStackIfUpdated(DB::PipelineExecutor::Edge&, std::__1::queue<DB::PipelineExecutor::ExecutionState*, std::__1::deque<DB::PipelineExecutor::ExecutionState*, std::__1::allocator<DB::PipelineExecutor::ExecutionState*> > >&, unsigned long) PipelineExecutor.cpp:264
DB::PipelineExecutor::prepareProcessor(unsigned long, unsigned long, std::__1::queue<DB::PipelineExecutor::ExecutionState*, std::__1::deque<DB::PipelineExecutor::ExecutionState*, std::__1::allocator<DB::PipelineExecutor::ExecutionState*> > >&, std::__1::unique_lock<std::__1::mutex>) PipelineExecutor.cpp:373
DB::PipelineExecutor::initializeExecution(unsigned long) PipelineExecutor.cpp:747
DB::PipelineExecutor::executeImpl(unsigned long) PipelineExecutor.cpp:764
DB::PipelineExecutor::execute(unsigned long) PipelineExecutor.cpp:479
DB::executeQuery(DB::ReadBuffer&, DB::WriteBuffer&, bool, DB::Context&, std::__1::function<void (std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&)>) executeQuery.cpp:833
DB::MySQLHandler::comQuery(DB::ReadBuffer&) MySQLHandler.cpp:307
DB::MySQLHandler::run() MySQLHandler.cpp:141

4. Output executes sending

DB::MySQLOutputFormat::consume(DB::Chunk) MySQLOutputFormat.cpp:53
DB::IOutputFormat::work() IOutputFormat.cpp:62
DB::executeJob(DB::IProcessor *) PipelineExecutor.cpp:155
operator() PipelineExecutor.cpp:172
DB::PipelineExecutor::executeStepImpl(unsigned long, unsigned long, std::__1::atomic<bool>*) PipelineExecutor.cpp:630
DB::PipelineExecutor::executeSingleThread(unsigned long, unsigned long) PipelineExecutor.cpp:546
DB::PipelineExecutor::executeImpl(unsigned long) PipelineExecutor.cpp:812
DB::PipelineExecutor::execute(unsigned long) PipelineExecutor.cpp:479
DB::executeQuery(DB::ReadBuffer&, DB::WriteBuffer&, bool, DB::Context&, std::__1::function<void (std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&)>) executeQuery.cpp:800
DB::MySQLHandler::comQuery(DB::ReadBuffer&) MySQLHandler.cpp:311
DB::MySQLHandler::run() MySQLHandler.cpp:141

to sum up

The modularity of ClickHouse is relatively clear. Like Lego blocks, it can be assembled and assembled. When we execute:

SELECT * FROM system.numbers LIMIT 5

First, the kernel parses the SQL statement to generate an AST, and then obtains the data source Source, pipeline.Add(Source) according to the AST. Secondly, the QueryPlan is generated based on the AST information, and the corresponding Transform is generated based on the QueryPlan, pipeline.Add(LimitTransform). Then add Output Sink as the data sending object, pipeline.Add(OutputSink). The pipeline is executed, and each Transformer starts to work.

ClickHouse's Transformer scheduling system is called Processor, which is also an important module that determines performance. For details, see Pipeline Processor and Scheduler. ClickHouse is a luxury sports car with manual transmission, free to own, Tsunami!

In-text link

  • https://github.com/ClickHouse/ClickHouse/blob/master/src/Server/MySQLHandler.cpp

  • https://github.com/ClickHouse/ClickHouse/blob/master/src/Core/MySQLProtocol.h

  • https://bohutang.me/2020/06/11/clickhouse-and-friends-processor/

The full text is over.

Enjoy ClickHouse :)

Teacher Ye's "MySQL Core Optimization" class has been upgraded to MySQL 8.0, scan the code to start the journey of MySQL 8.0 practice

Guess you like

Origin blog.csdn.net/n88Lpo/article/details/108906263