TKE User Story - Zhuobang Low-cost Log Retrieval Service at PB Level

author

Lu Yalin, who joined Jobbang in 2019, is responsible for the R&D of the Jobbang architecture. During the Jobbang period, he led the evolution of cloud native architecture, promoted the implementation of containerization transformation, service governance, GO microservice framework, and the implementation of DevOps.

Mo Renpeng, who joined Jobbang in 2020, is a senior architect of Jobbang. During the Jobbang period, he promoted the evolution of the cloud-native architecture of Jobbang, responsible for the design and implementation of the Jobbang service governance system, the construction of the service perception system, and the self-developed mesh, MQproxy research and development work.

Summary

Logs are the main way to observe services. We rely on logs to perceive the running status and historical status of services. When errors occur, we rely on logs to understand the scene and locate problems. Logs are extremely critical for R&D engineers. At the same time, with the popularity of microservices, service deployment is becoming more and more decentralized, so we need a set of log services to collect, transmit, and retrieve logs.

Based on this situation, the open source log service represented by ELK was born.

demand scenario

In our scenario, the peak log writing pressure is high (tens of millions of logs per second); the real-time requirements are high: the log processing time from collection to retrieval is normally within 1s (peak period 3s); the cost pressure is huge, It is required to save logs for half a year and can be retrospectively queried (hundreds of PB scale).

Disadvantages of ElasticSearch

The core of the ELK solution is ElasticSearch , which is responsible for storing and indexing logs and providing external query capabilities. Elasticsearch is a search engine. The bottom layer relies on Lucene's inverted index technology to achieve retrieval, and it splits data shards through the **shard** design to break through the storage space and processing performance limitations of a single machine.

Write performance

​ ElasticSearch needs to update the inverted index of the log index field when writing data, so that the latest log can be retrieved. In order to improve the write performance, you can do aggregate submission, delay indexing, reduce refersh, etc., but you must always build an index. In the case of huge log traffic (20GB data per second, tens of millions of log entries), the bottleneck is obvious. Too far from ideal, we expect writes in near real-time.

Operating costs

​ ElasticSearch needs to regularly maintain indexes, data sharding and retrieval cache, which will take up a lot of CPU and memory. Log data is stored on the machine disk. When a large number of logs need to be stored and stored for a long time, the machine disk usage is huge. At the same time, data expansion will be brought about after indexing, which will further increase the cost.

Bad support for unformatted logs

ELK needs to parse logs to index log items, and unformatted logs need to add additional processing logic to adapt. There are many business logs that are not standardized and difficult to converge.

Summary: The log retrieval scenario is a scenario with more writes and fewer reads . Maintaining a huge and complex index in such a scenario is actually a low cost performance in our opinion. If the ElasticSearch solution is adopted, it is estimated that we need a cluster of tens of thousands of cores, which still cannot guarantee the efficiency of data writing and retrieval, and the waste of resources is serious.

log retrieval design

Faced with this situation, we might as well look at the log retrieval scenario from a different perspective, and use a more suitable design to solve the log retrieval needs. The new design has the following three points:

log chunking

In the same way, we need to collect logs, but when processing logs, we do not parse and index the original text of the logs, but divide the logs by log metadata such as log time, log instance, log type, and log level. In this way, the retrieval system does not require any log format , and because there are no steps of parsing and indexing (which is very expensive), the writing speed can also reach the extreme (only depends on the IO speed of the disk).

In simple terms, we can write the same type of logs generated by an instance to a file in chronological order, and split the file according to the time dimension. Different log blocks will be scattered on multiple machines (we usually The storage machine of log blocks is sharded with dimensions such as type and type), so that we can process these log blocks concurrently on multiple machines, which supports horizontal expansion. If the processing performance of one machine is not enough , and then expand horizontally.

How to retrieve the data in the log block? This is very simple, because the original text of the log is saved, and the log block can be retrieved and processed directly by using grep-related commands. For developers, grep is the most familiar command, and it is also very flexible to use, which can meet the various needs of development for log retrieval. Because we are directly appending to the log block, we do not need to wait for the index to be established to take effect, and the log can be retrieved immediately when the log is flushed to the log block, which ensures the real-time retrieval results .

metadata index

Next, let's see how to retrieve such a large number of log blocks.

First, when the log block is created, we will build an index based on the metadata information of the log block, such as service name, log time, log instance, log type and other information, and store the storage location of the log block as a value. By indexing the metadata of log blocks, when we need to retrieve a certain type of logs of a service within a certain period of time, we can quickly find the location of the log block that needs to be retrieved and process it concurrently.

The structure of the index can be constructed on demand, and you can put the metadata information you care about into the index, so that it is convenient to quickly delineate the required log blocks. Because we only index the metadata of log blocks, compared to indexing all logs, this cost can be said to be extremely low, and the speed of locking log blocks is ideal enough.

Log Life Cycle and Data Settlement

Log data can be understood as time series data in the direction of time dimension. The closer the log is to the current time, the more valuable it is, and the higher the possibility of being queried, showing a situation of separation of hot and cold. In addition, cold data is not worthless. There are also scenarios where developers require to go back to log data from several months ago, that is, our logs need to be able to provide external query capabilities during their life cycle.

In this case, if all log blocks in the life cycle are stored on the local disk, it will undoubtedly put a great demand on our machine capacity. For this log storage requirement, we can solve it by means of compression and settlement.

To put it simply, we divide log block storage into three levels: local storage (disk), remote storage (object storage), and archive storage; local storage is responsible for providing real-time and short-term log queries (one day or a few hours), and remote storage Responsible for log query requirements within a certain period (one week or several weeks), and archive storage is responsible for query requirements throughout the log life cycle.

Now let's see how the log block flows between multi-level storage in its life cycle. First, the log block will be created on the local disk and written to the corresponding log data. After completion, it will be retained on the local disk for a certain period of time (retention time). depends on the disk storage pressure), after saving for a certain period of time, it will first be compressed and then uploaded to remote storage (usually the standard storage type in object storage), and after a period of time, log blocks will be migrated to archive storage Save (usually the archive storage type in object storage).

What are the benefits of such a storage design? As shown in the multi-level storage diagram below, the higher the amount of data stored, the lower the cost of the storage medium. Each layer is about 1/3 of the previous layer. And the data is stored after compression. The data compression ratio of the log can generally reach 10:1 . From this, the cost of archiving and storing the log can be about **1%** of the local storage. If the SSD hard disk is used as the local storage Storage, the gap will be even greater.

Price reference:

storage medium Reference link
local disk https://buy.cloud.tencent.com/price/cvm?regionId=8&zoneId=800002
object storage https://buy.cloud.tencent.com/price/cos
archive storage https://buy.cloud.tencent.com/price/cos

Then how is it retrieved between multi-level storage? This is very simple. For retrieval on local storage, it can be performed directly on the local disk.

If retrieval involves log blocks on remote storage, the retrieval service will download the involved log blocks to local storage, and then complete decompression and retrieval locally. Because of the design of log blocks, the download of log blocks is the same as retrieval, and we can operate on multiple machines in parallel; the data copy downloaded back to the local support can be deleted after a certain period of time in the local cache, so that the same log block will be deleted within the validity period. The retrieval requirements can be completed locally without the need to repeatedly pull it again (it is still very common to retrieve the same log data multiple times in the log retrieval scenario).

For archive storage, before initiating a retrieval request, a retrieval operation needs to be initiated on the log blocks in the archive storage. The retrieval operation generally takes about a few minutes. After the retrieval operation is completed, the log blocks are retrieved to the remote storage, and then The subsequent data flow is the same as before. That is, if developers want to retrieve cold data, they need to apply for archive retrieval of log blocks in advance. After the retrieval is completed, they can retrieve logs at the speed of hot data.

Retrieval Service Architecture

After understanding the above design ideas, let's see how the log retrieval service based on this design is implemented.

The log retrieval service is divided into the following modules:

  • GD-Search

​ The query scheduler is responsible for accepting query requests, parsing and optimizing query commands, and obtaining the addresses of log blocks within the query range from Chunk Index , and finally generating a distributed query plan

​GD -Search itself is stateless, multiple instances can be deployed, and a unified access address is provided externally through load balancing.

  • Local-Search

​ The local storage queryer is responsible for processing query requests for local log blocks allocated by GD-Search .

  • Remote-Search

​ The remote storage queryer is responsible for processing query requests for remote log blocks allocated by GD-Search .

​Remote -Search will pull the required log blocks from the remote storage to the local and decompress them, and then query on the local storage like Local-Search . At the same time, Remote-Search will update the local storage address of the log block to the Chunk Index , so that subsequent query requests of the same log block can be routed to the local storage.

  • Log-Manager

The local storage manager, responsible for maintaining the lifecycle of log blocks on local storage.

​Log -Manager will regularly scan the log blocks on the local storage. If the log block exceeds the local storage period or the disk usage reaches the bottleneck, some log blocks will be eliminated according to the policy (compressed and uploaded to the remote storage, the compression algorithm adopts ZSTD ), and update the storage information of the log block in the Chunk Index .

  • Log-Ingester

The log ingestor module is responsible for subscribing log data from log kafka, and then splitting the log data according to the time dimension and metadata dimension, and writing it into the corresponding log block. When a new log block is generated, Log-Ingester writes the metadata of the log block into the Chunk Index to ensure that the latest log block can be retrieved in real time.

  • Chunk Index

​ Log block metadata storage, responsible for saving the metadata and storage information of log blocks. At present, we choose Redis as the storage medium. In the case that the metadata index is not complicated, redis can already meet our needs for indexing log blocks, and the memory-based query speed can also meet our needs for quickly locking log blocks.

retrieval strategy

In the design of retrieval strategy, we believe that the return speed of retrieval is to pursue faster, and at the same time avoid huge query requests from entering the system.

We believe that log retrieval generally has the following three scenarios:

  1. View the latest service log.

  2. View the log of a request and query by logid.

  3. View certain types of logs, such as error logs for accessing mysql, logs for requesting downstream services, and so on.

In most scenarios, the user does not need all the matching logs, and taking a part of the logs is enough to deal with the problem. Therefore, the user can set the limit number when querying, and the entire retrieval service terminates the current query request and returns the result to the front end when the query result meets the log number set by the limit.

In addition , when the GD-Search component initiates log block retrieval, it will also determine the total size of the retrieved log blocks in advance, and will reject large-scale retrieval requests that exceed the limit. (Users can adjust the search time range and try several times or adjust the search statement to make it more selective)

Performance at a Glance

Use 1KB of each log for testing, the total number of log blocks is about 10,000, the local storage uses NVME SSD hard disk, and the remote storage uses S3 protocol standard storage.

• write

​ A single core can support a write speed of 2W/S, and a write speed of 1W/S takes up about 1~2G of memory, which can be distributed and expanded without upper limit

• Query (full text search)

​ 1TB log data query speed based on local storage can be completed within 3S

​ Querying 1TB log data based on remote storage takes 10S.

Cost advantage

With tens of millions of writes per second and hundreds of petabytes of storage, we can use more than a dozen physical servers to ensure log writing and querying. The hot data is on the local nvme disk, the next hot data is in the object store, and a large amount of log data is stored in the archive storage service.

Computational comparison

Because there is no need to build an index, we only need the thousand-core level to ensure writing. At the same time, the log index is a service that writes more and reads less. Thousands of cores can guarantee a hundred-level QPS query.

ES needs to invest tens of thousands of cores in this order of magnitude. To deal with write performance and query bottlenecks, but still cannot guarantee write and query efficiency.

Storage comparison

The core is to use cheaper storage media (archive storage vs local disk) and less storage data (compression rate 1/10 vs log data index bloat) while ensuring business requirements. There can be two orders of magnitude difference.

[Tencent Cloud Native] New products of Yunshuo, new techniques of Yunyan, new activities of Yunyou, and information of cloud appreciation, scan the code to follow the public account of the same name, and get more dry goods in time! !

{{o.name}}
{{m.name}}

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=324126102&siteId=291194637