The main process of ClickHouse source code reading notes (1)

The main process of ClickHouse source code reading notes (1)

The entry main function is in dbms / programs / main.cpp

int main(int argc_, char ** argv_)
{
...

/// Print a basic help if nothing was matched
MainFunc main_func = printHelp; // Here to determine which func is executed later according to the parameters passed in at startup. For the server, the corresponding function is mainEntryClickHouseServer

for (auto & application : clickhouse_applications)
{
if (isClickhouseApp(application.first, argv))
{
main_func = application.second;
break;
}
}

return main_func (static_cast <int> (argv.size ()), argv.data ()); // For server, after calling mainEntryClickHouseServer here, go to dbms / programs / server / server.cpp
}

In dbms / programs / server / server.cpp, three types of interfaces are provided. According to the description of the source code, the instructions are as follows:

/** Server provides three interfaces:
* 1. HTTP - simple interface for any applications.

HTTP interface for any application.
* 2. TCP-interface for native clickhouse-client and for server to server internal communications.

TCP interface for communication between local client and server.
* More rich and efficient, but less compatible

Rich and efficient, but poor compatibility
*-data is transferred by columns;

Data
is transferred by column *-data is transferred compressed;

Data is compressed and transmitted
* Allows to get more information in response.

Allow more information in the response message
* 3. Interserver HTTP-for replication.

Internal HTTP for replication.
* /

The main function in dbms / programs / server / server.cpp will parse the parameter configuration, initialize the server, and start the service listening port.

int mainEntryClickHouseServer(int argc, char ** argv)
{
DB::Server app;
try
{
return app.run(argc, argv);//这里调用run。
}
catch (...)
{
std::cerr << DB::getCurrentExceptionMessage(true) << "\n";
auto code = DB::getCurrentExceptionCode();
return code ? code : 1;
}
}

Clickhouse uses the Poco network library to process network requests. The processing logic for each client connection is in the run () method of dbms / programs / server // TCPHandler.cpp.

void TCPHandler :: run ()
{
try
{
runImpl (); // Call the runImpl function here.

LOG_INFO(log, "Done processing connection.");
}
catch (Poco::Exception & e)
{
/// Timeout - not an error.
if (!strcmp(e.what(), "Timeout"))
{
LOG_DEBUG(log, "Poco::Exception. Code: " << ErrorCodes::POCO_EXCEPTION << ", e.code() = " << e.code()
<< ", e.displayText() = " << e.displayText() << ", e.what() = " << e.what());
}
else
throw;
}
}

 

In the TCPHandler :: runImpl () function, remove the handshake, initialize the context, exception handling and other code, the main logic is as follows:

void TCPHandler::runImpl()
{

receivePacket (); // Receive request

executeQuery(state.query, *query_context, false, state.stage, may_have_embedded_data);//处理请求

/// Does the request require receive data from client?
If (state.need_receive_data_for_insert)
processInsertQuery (connection_settings); // Responsible for returning the result to the client
else if (state.need_receive_data_for_input)
{
/// It is special case for input ( ), all works for reading data from client will be done in callbacks.
/// state.io.in is NullAndDoCopyBlockInputStream so read it once.
state.io.in-> read ();
state.io.onFinish ();
}
else if (state.io.pipeline.initialized ())
processOrdinaryQueryWithProcessors (query_context-> getSettingsRef (). max_threads); // Responsible for returning the result to the client
else
processOrdinaryQuery (); // Responsible for returning the result to the client

}

Next, we continue to look at the logic of executeQuery processing requests. In dbms / src / Interpreters / executeQuery.cpp, the main logic is as follows:

BlockIO executeQuery(
const String & query,
Context & context,
bool internal,
QueryProcessingStage::Enum stage,
bool may_have_embedded_data,
bool allow_processors)
{

std::tie(ast, streams) = executeQueryImpl(query.data(), query.data() + query.size(), context,
internal, stage, !may_have_embedded_data, nullptr, allow_processors);//这里调用executeQueryImpl

}

Next, look at the main processing logic of executeQueryImpl:

static std::tuple<ASTPtr, BlockIO> executeQueryImpl(
const char * begin,
const char * end,
Context & context,
bool internal,
QueryProcessingStage::Enum stage,
bool has_query_tail,
ReadBuffer * istr,
bool allow_processors)
{

ast = parseQuery (parser, begin, end, "", max_query_size, settings.max_parser_depth); // Parse query statement

if (use_processors) // Use pipeline
pipeline = interpreter-> executeWithProcessors ();
else // Do not use pipiline
res = interpreter-> execute (); // Call the execute function of the corresponding type according to the type of interpreter

}

The next article will introduce interpreter, to be continued. . .

Guess you like

Origin www.cnblogs.com/snake-fly/p/12689092.html