ClickHouse y sus amigos (2) Protocolo MySQL y pila de llamadas de lectura

Fuente original: https://bohutang.me/2020/06/07/clickhouse-and-friends-mysql-protocol-read-stack/

Como DBMS OLAP, hay dos aspectos importantes:

¿Cómo pueden los usuarios entrar fácilmente? Este es el punto de entrada
- Además de su propio cliente, ClickHouse también proporciona métodos de acceso como MySQL / PG / GRPC / HTTP
Cómo colgar fácilmente los datos, esta es la fuente de datos
- Además de su propio motor, ClickHouse también puede montar fuentes de datos externas como MySQL / Kafka

De esta manera, comunicación interna y externa, múltiples amigos y múltiples caminos, con el fin de lograr capacidades de orquestación a nivel de "datos".

Hoy hablamos del protocolo MySQL en el lado de la entrada, y también es el primer buen amigo de ClickHouse en esta serie.Los usuarios pueden vincular directamente a ClickHouse a través del cliente MySQL o Driver relacionado para realizar operaciones como lectura y escritura de datos.

Este artículo utiliza la solicitud de consulta MySQL y toma prestada la pila de llamadas para comprender todo el proceso de lectura de datos de ClickHouse.

¿Como alcanzar?

El archivo de entrada está en: MySQLHandler.cpp

Acuerdo de apretón de manos

MySQLClient envía un mensaje de datos de saludo a MySQLHandler
MySQLHandler responde con un mensaje de saludo-respuesta
MySQLClient envía un mensaje de autenticación
MySQLHandler autentica el mensaje de autenticación y devuelve el resultado de la autenticación

El protocolo MySQL está implementado en: Core / MySQLProtocol.h

Solicitud de consulta

Una vez pasada la autenticación, se puede llevar a cabo la interacción normal de datos.

Cuando MySQLClient envía una solicitud:

mysql> SELECT * FROM system.numbers LIMIT 5;

La pila de llamadas de MySQLHandler:

->MySQLHandler::comQuery -> executeQuery -> pipeline->execute -> MySQLOutputFormat::consume

MySQLClient recibe el resultado

En el paso 2, executeQuery (executeQuery.cpp) es muy importante. Es el puerto de acceso para todos los servidores front-end y el kernel de ClickHouse. El primer parámetro es el texto SQL ('seleccionar 1'), y el segundo parámetro es donde se debe enviar el conjunto de resultados (socket net).

Análisis de pila de llamadas

SELECT * FROM system.numbers LIMIT 5

1. Obtenga la fuente de datos

Fuente de datos StorageSystemNumbers:

DB::StorageSystemNumbers::read(std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&, std::__1::shared_ptr<DB::StorageInMemoryMetadata const> const&, DB::SelectQueryInfo const&, DB::Context const&, DB::QueryProcessingStage::Enum, unsigned long, unsigned int) StorageSystemNumbers.cpp:135
DB::ReadFromStorageStep::ReadFromStorageStep(std::__1::shared_ptr<DB::RWLockImpl::LockHolderImpl>, std::__1::shared_ptr<DB::StorageInMemoryMetadata const>&, DB::SelectQueryOptions, 
DB::InterpreterSelectQuery::executeFetchColumns(DB::QueryProcessingStage::Enum, DB::QueryPlan&, std::__1::shared_ptr<DB::PrewhereInfo> const&, std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&) memory:3028
DB::InterpreterSelectQuery::executeFetchColumns(DB::QueryProcessingStage::Enum, DB::QueryPlan&, std::__1::shared_ptr<DB::PrewhereInfo> const&, std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&) InterpreterSelectQuery.cpp:1361
DB::InterpreterSelectQuery::executeImpl(DB::QueryPlan&, std::__1::shared_ptr<DB::IBlockInputStream> const&, std::__1::optional<DB::Pipe>) InterpreterSelectQuery.cpp:791
DB::InterpreterSelectQuery::buildQueryPlan(DB::QueryPlan&) InterpreterSelectQuery.cpp:472
DB::InterpreterSelectWithUnionQuery::buildQueryPlan(DB::QueryPlan&) InterpreterSelectWithUnionQuery.cpp:183
DB::InterpreterSelectWithUnionQuery::execute() InterpreterSelectWithUnionQuery.cpp:198
DB::executeQueryImpl(const char *, const char *, DB::Context &, bool, DB::QueryProcessingStage::Enum, bool, DB::ReadBuffer *) executeQuery.cpp:385
DB::executeQuery(DB::ReadBuffer&, DB::WriteBuffer&, bool, DB::Context&, std::__1::function<void (std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, 
DB::MySQLHandler::comQuery(DB::ReadBuffer&) MySQLHandler.cpp:307
DB::MySQLHandler::run() MySQLHandler.cpp:141

Lo principal aquí es la función ReadFromStorageStep, que obtiene la tubería de la fuente de datos de un almacenamiento diferente:

Pipes pipes = storage->read(required_columns, metadata_snapshot, query_info, *context, processing_stage, max_block_size, max_streams);

2. Estructura del oleoducto

DB::LimitTransform::LimitTransform(DB::Block const&, unsigned long, unsigned long, unsigned long, bool, bool, std::__1::vector<DB::SortColumnDescription, std::__1::allocator<DB::SortColumnDescription> >) LimitTransform.cpp:21
DB::LimitStep::transformPipeline(DB::QueryPipeline&) memory:2214
DB::LimitStep::transformPipeline(DB::QueryPipeline&) memory:2299
DB::LimitStep::transformPipeline(DB::QueryPipeline&) memory:3570
DB::LimitStep::transformPipeline(DB::QueryPipeline&) memory:4400
DB::LimitStep::transformPipeline(DB::QueryPipeline&) LimitStep.cpp:33
DB::ITransformingStep::updatePipeline(std::__1::vector<std::__1::unique_ptr<DB::QueryPipeline, std::__1::default_delete<DB::QueryPipeline> >, std::__1::allocator<std::__1::unique_ptr<DB::QueryPipeline, std::__1::default_delete<DB::QueryPipeline> > > >) ITransformingStep.cpp:21
DB::QueryPlan::buildQueryPipeline() QueryPlan.cpp:154
DB::InterpreterSelectWithUnionQuery::execute() InterpreterSelectWithUnionQuery.cpp:200
DB::executeQueryImpl(const char *, const char *, DB::Context &, bool, DB::QueryProcessingStage::Enum, bool, DB::ReadBuffer *) executeQuery.cpp:385
DB::executeQuery(DB::ReadBuffer&, DB::WriteBuffer&, bool, DB::Context&, std::__1::function<void (std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&)>) executeQuery.cpp:722
DB::MySQLHandler::comQuery(DB::ReadBuffer&) MySQLHandler.cpp:307
DB::MySQLHandler::run() MySQLHandler.cpp:141

3. Ejecución del oleoducto

DB::LimitTransform::prepare(std::__1::vector<unsigned long, std::__1::allocator<unsigned long> > const&, std::__1::vector<unsigned long, std::__1::allocator<unsigned long> > const&) LimitTransform.cpp:67
DB::PipelineExecutor::prepareProcessor(unsigned long, unsigned long, std::__1::queue<DB::PipelineExecutor::ExecutionState*, std::__1::deque<DB::PipelineExecutor::ExecutionState*, std::__1::allocator<DB::PipelineExecutor::ExecutionState*> > >&, std::__1::unique_lock<std::__1::mutex>) PipelineExecutor.cpp:291
DB::PipelineExecutor::tryAddProcessorToStackIfUpdated(DB::PipelineExecutor::Edge&, std::__1::queue<DB::PipelineExecutor::ExecutionState*, std::__1::deque<DB::PipelineExecutor::ExecutionState*, std::__1::allocator<DB::PipelineExecutor::ExecutionState*> > >&, unsigned long) PipelineExecutor.cpp:264
DB::PipelineExecutor::prepareProcessor(unsigned long, unsigned long, std::__1::queue<DB::PipelineExecutor::ExecutionState*, std::__1::deque<DB::PipelineExecutor::ExecutionState*, std::__1::allocator<DB::PipelineExecutor::ExecutionState*> > >&, std::__1::unique_lock<std::__1::mutex>) PipelineExecutor.cpp:373
DB::PipelineExecutor::tryAddProcessorToStackIfUpdated(DB::PipelineExecutor::Edge&, std::__1::queue<DB::PipelineExecutor::ExecutionState*, std::__1::deque<DB::PipelineExecutor::ExecutionState*, std::__1::allocator<DB::PipelineExecutor::ExecutionState*> > >&, unsigned long) PipelineExecutor.cpp:264
DB::PipelineExecutor::prepareProcessor(unsigned long, unsigned long, std::__1::queue<DB::PipelineExecutor::ExecutionState*, std::__1::deque<DB::PipelineExecutor::ExecutionState*, std::__1::allocator<DB::PipelineExecutor::ExecutionState*> > >&, std::__1::unique_lock<std::__1::mutex>) PipelineExecutor.cpp:373
DB::PipelineExecutor::tryAddProcessorToStackIfUpdated(DB::PipelineExecutor::Edge&, std::__1::queue<DB::PipelineExecutor::ExecutionState*, std::__1::deque<DB::PipelineExecutor::ExecutionState*, std::__1::allocator<DB::PipelineExecutor::ExecutionState*> > >&, unsigned long) PipelineExecutor.cpp:264
DB::PipelineExecutor::prepareProcessor(unsigned long, unsigned long, std::__1::queue<DB::PipelineExecutor::ExecutionState*, std::__1::deque<DB::PipelineExecutor::ExecutionState*, std::__1::allocator<DB::PipelineExecutor::ExecutionState*> > >&, std::__1::unique_lock<std::__1::mutex>) PipelineExecutor.cpp:373
DB::PipelineExecutor::tryAddProcessorToStackIfUpdated(DB::PipelineExecutor::Edge&, std::__1::queue<DB::PipelineExecutor::ExecutionState*, std::__1::deque<DB::PipelineExecutor::ExecutionState*, std::__1::allocator<DB::PipelineExecutor::ExecutionState*> > >&, unsigned long) PipelineExecutor.cpp:264
DB::PipelineExecutor::prepareProcessor(unsigned long, unsigned long, std::__1::queue<DB::PipelineExecutor::ExecutionState*, std::__1::deque<DB::PipelineExecutor::ExecutionState*, std::__1::allocator<DB::PipelineExecutor::ExecutionState*> > >&, std::__1::unique_lock<std::__1::mutex>) PipelineExecutor.cpp:373
DB::PipelineExecutor::tryAddProcessorToStackIfUpdated(DB::PipelineExecutor::Edge&, std::__1::queue<DB::PipelineExecutor::ExecutionState*, std::__1::deque<DB::PipelineExecutor::ExecutionState*, std::__1::allocator<DB::PipelineExecutor::ExecutionState*> > >&, unsigned long) PipelineExecutor.cpp:264
DB::PipelineExecutor::prepareProcessor(unsigned long, unsigned long, std::__1::queue<DB::PipelineExecutor::ExecutionState*, std::__1::deque<DB::PipelineExecutor::ExecutionState*, std::__1::allocator<DB::PipelineExecutor::ExecutionState*> > >&, std::__1::unique_lock<std::__1::mutex>) PipelineExecutor.cpp:373
DB::PipelineExecutor::initializeExecution(unsigned long) PipelineExecutor.cpp:747
DB::PipelineExecutor::executeImpl(unsigned long) PipelineExecutor.cpp:764
DB::PipelineExecutor::execute(unsigned long) PipelineExecutor.cpp:479
DB::executeQuery(DB::ReadBuffer&, DB::WriteBuffer&, bool, DB::Context&, std::__1::function<void (std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&)>) executeQuery.cpp:833
DB::MySQLHandler::comQuery(DB::ReadBuffer&) MySQLHandler.cpp:307
DB::MySQLHandler::run() MySQLHandler.cpp:141

4. La salida ejecuta el envío

DB::MySQLOutputFormat::consume(DB::Chunk) MySQLOutputFormat.cpp:53
DB::IOutputFormat::work() IOutputFormat.cpp:62
DB::executeJob(DB::IProcessor *) PipelineExecutor.cpp:155
operator() PipelineExecutor.cpp:172
DB::PipelineExecutor::executeStepImpl(unsigned long, unsigned long, std::__1::atomic<bool>*) PipelineExecutor.cpp:630
DB::PipelineExecutor::executeSingleThread(unsigned long, unsigned long) PipelineExecutor.cpp:546
DB::PipelineExecutor::executeImpl(unsigned long) PipelineExecutor.cpp:812
DB::PipelineExecutor::execute(unsigned long) PipelineExecutor.cpp:479
DB::executeQuery(DB::ReadBuffer&, DB::WriteBuffer&, bool, DB::Context&, std::__1::function<void (std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&)>) executeQuery.cpp:800
DB::MySQLHandler::comQuery(DB::ReadBuffer&) MySQLHandler.cpp:311
DB::MySQLHandler::run() MySQLHandler.cpp:141

para resumir

La modularidad de ClickHouse es relativamente clara. Al igual que los bloques de Lego, se puede ensamblar y ensamblar. Cuando ejecutamos:

SELECT * FROM system.numbers LIMIT 5

Primero, el kernel analiza la declaración SQL para generar un AST y luego obtiene la fuente de datos Source, pipeline.Add (Source) de acuerdo con el AST. En segundo lugar, el QueryPlan se genera en función de la información de AST, y la transformación correspondiente se genera en función del QueryPlan, pipeline.Add (LimitTransform). Luego agregue Output Sink como el objeto de envío de datos, pipeline.Add (OutputSink). La canalización se ejecuta y cada transformador comienza a funcionar.

El sistema de programación Transformer de ClickHouse se llama Procesador, que también es un módulo importante que determina el rendimiento. Para obtener más detalles, consulte Procesador y programador de canalizaciones. ClickHouse es un automóvil deportivo de lujo con transmisión manual, ¡gratuito, Tsunami!

Enlace en texto

https://github.com/ClickHouse/ClickHouse/blob/master/src/Server/MySQLHandler.cpp
https://github.com/ClickHouse/ClickHouse/blob/master/src/Core/MySQLProtocol.h
https://bohutang.me/2020/06/11/clickhouse-and-friends-processor/

El texto completo ha terminado.

Disfruta ClickHouse :)

La clase "MySQL Core Optimization" de Teacher Ye se ha actualizado a MySQL 8.0, escanee el código para comenzar el viaje de la práctica de MySQL 8.0