Quickly understand the log overview and interpret 13 log mode parsing algorithms in detail

Cloud Intelligence AIOps community is initiated by Cloud Intelligence, aiming at operation and maintenance business scenarios, providing an overall service system of algorithms, computing power, data sets, and a solution exchange community for intelligent operation and maintenance business scenarios. The community is committed to spreading AIOps technology, aiming to solve technical problems in the intelligent operation and maintenance industry with customers, users, researchers and developers in various industries, promote the implementation of AIOps technology in enterprises, and build a healthy and win-win AIOps developer ecosystem.

Log mode parsing is an algorithm that parses logs from semi-structured data into structured data, which can help us quickly understand the overview of a large number of logs. In the process of automatic log analysis, it is often used as an intermediate step to serve the subsequent exceptions of the logs. detection tasks. In this technical blackboard report, we will explain log mode parsing in detail around three questions: what is log mode parsing, why do log mode parsing, and how to implement log mode parsing.

1. What is log mode parsing:

We can use the following diagram to understand what log schema parsing does:

First of all, it needs to be clear that the log is a kind of semi-structured data, which is generated by a specific code. As shown above, the log message: 2015-10-18 18:05:29,570 INFO dfs.DataNode$PacketResponder: Received block blk_-562725280853087685 of size 67108864 from /10.251.91.84. It is generated by the code LOG.info("Received block " + block + " of size " + block.getNumBytes() + " from " + inAddr);. The purpose of log mode parsing is to parse the log into the form of structured data as shown in the figure above, that is, to extract the timestamp, level, component, log template and parameter information from the log. The three information of log timestamp, level, and component can be easily obtained through simple regularization, so the log pattern parsing algorithm really focuses on the extraction of log templates and parameters.

What are log templates and parameters? With a little bit of code knowledge, we can all know that in the log printed by the code LOG.info("Received block " + block + " of size " + block.getNumBytes() + " from " + inAddr), the text will appear: Received block , of size , from, these texts are called constants, and due to the different system status each time the log is printed, the block, block.getNumBytes(), inAddr printed by each log may be different, and these texts are called parameters. . We keep the constants in the log, and replace the parameters with the specific symbols <*>, so that the generated text is the template of the log.

The log pattern parsing process can be understood as a process of pushing back the log printing code, and also a process of clustering logs (logs with the same template are considered to be the same type of logs). Different articles have different names for log pattern analysis, such as log template mining, log pattern discovery, log pattern recognition, log clustering, etc., which actually refer to log pattern analysis.

Second, why do log mode analysis:

Log mode parsing is a function that many log products have now. Why do we all need to do log mode parsing? First of all, log mode analysis can help us quickly understand the log overview. In today's computer systems, the number of logs is often very large. A system may generate millions of logs in a day. It is obviously unrealistic to observe directly with the human eye. Log mode analysis, we can compress millions of logs into hundreds of templates, so that the purpose of human eyes can be achieved (as shown in the figure below).

Second, schema parsing is often an intermediate step in the automated analysis process, serving tasks such as service and subsequent anomaly detection, because the results of schema parsing are easier to analyze. For example, we can obtain the periodicity of the pattern by analyzing the printing time of the log of a certain pattern, and consider the points that do not conform to the period to be abnormal; for another example, we can analyze the order of appearance between patterns. When we find that mode 2 always appears after mode 1, if we suddenly find that mode 2 appears alone at a certain time, it is also an anomaly. abnormality can be judged.

3. How to implement log mode analysis:

This technical blackboard report investigates a total of 13 classic log parsing algorithms, most of which are from the review "Tools and Benchmarks for Automated Log Parsing". In this technical blackboard report, according to the principles of the algorithms, these algorithms are divided into three categories: log pattern parsing algorithms based on clustering, log pattern parsing algorithms based on frequent item mining, and log pattern parsing algorithms based on heuristics. The following figure shows the algorithms and their classifications designed in this technical blackboard report:

The logs printed by the same code must be similar, so we can get the idea of ​​the first pattern analysis, give the text similarity formula or distance formula, and gather the logs of the same pattern together through the clustering algorithm, and then Then obtain the log template, that is, the log pattern analysis algorithm based on clustering, such as Drain, Spell (LCS can also be considered as a text similarity), Lenma, Logmine, SHISO, etc.

And because when the code prints the log, the constant will appear in all the printed logs, and the parameters may be very different in the log, so we believe that the constant appears frequently in the log, and the parameter appears in the log. The frequency is low, in other words, the higher frequency is the constant, and the lower frequency is the parameter. Through this property, we can get the second pattern analysis idea, that is, the log pattern analysis algorithm based on frequent item mining, such as SLCT , Logram, FT-Tree.

In addition, some algorithms contain some heuristic assumptions about logs. For example, only logs of the same length may be of the same type, and logs with the same first few words may be of the same type. These assumptions can also be used for patterns. parsing. Some clustering algorithms may also contain some heuristic assumptions, but the clustering-based log pattern parsing algorithm and the heuristic-based log pattern parsing algorithm can be distinguished by whether the algorithm uses similarity.

On the whole, these three types of methods can be abstracted into the following processes. Among them, a few methods involve the pattern fusion process, and most methods only have the process of preprocessing, clustering, and template acquisition, and template acquisition is often associated with clustering. Process fusion, some algorithms even obtain templates first, and then aggregate the logs of the same template, so this technical blackboard report introduces these two processes in the clustering process.

1. Preprocessing:

Whether it is a log pattern parsing algorithm based on clustering, a log pattern parsing algorithm based on frequent item mining, or a log pattern parsing algorithm based on heuristics, word segmentation is performed before the log is parsed, because a word is the smallest unit that expresses a complete meaning. . In addition to word segmentation, Drain, DAGDrain, POP, LogMine, and LkE all mentioned that type identification is required, that is, through regular matching, some special words, such as IP address, time, etc., are identified, and then replaced with special characters or removed, This is because these special words are obviously parameters, which can effectively improve the similarity of logs of the same pattern. AEL proposes to identify key-value pairs and replace the value with a special field, which is also a similar consideration. In addition, Logram proposes that the header of the log needs to be removed before parsing (ie, the timestamp, level, and component information of the log). The Lenma and SHISO algorithms calculate the similarity based on the feature vector extracted from the log, rather than the log word list after word segmentation, so these two algorithms will perform multiple feature extraction steps during preprocessing.

2. Clustering:

In this section, clustering methods are described according to the categories of parsing algorithms.

(1). Cluster-based log pattern analysis:

a. Similarity

The log pattern analysis based on clustering will use the text similarity or distance formula to determine whether the log belongs to a certain pattern. When calculating the text similarity, some algorithms directly calculate the similarity through the word list, such as Drain, DAGDrain, etc.; while some algorithms first extract the features through the word list, and then calculate the similarity of the features, such as Lenma, SHISO. Some algorithms use text similarity formulas that require inputs of equal length, while others do not.

As can be seen from the above table, most of the text similarity formulas used by the algorithms require the input to be of equal length. This is because these algorithms all assume that the logs of the same mode have the same length, which can be effectively reduced. The number of similarity calculations reduces the difficulty of similarity calculation, but there are also certain limitations.

The similarity formula used by Drain and DAGDrain is the same, both are:

The translation of the formula is that the two logs are different from left to right, one by one. The number of the same words is counted, and the similarity is obtained by dividing by the length of the log. For example, the log [Node, 001, is, unconnected] and In the log [Node, 002, is, unconnected], the three words Node, is, and unconnected are the same and have the same position, so the similarity between the two logs is 3/4.

The distance formula used by LogMine is:

This distance formula is very similar to the similarity formula used by Drain, but he allows two logs to be unequal in length, and also allows manually setting the score when the words are the same. When the formula is translated, it is also two logs from left to right, each word looks different, until the shorter log sees the end and does not read it, count the number of the same words, divide by the larger log length to get the similarity . For example, if k1=1, the log [Node, 001, is, unconnected] is the same as the log [Node, 002, is, unconnected, too], and the three words Node, is, and unconnected are the same, and the positions are the same. The similarity is 3/5. However, this distance formula also has limitations. If there is a little offset in the log position, the similarity calculated by the logs of the same mode will also be very low, such as log [Node, 001, is, unconnected] and log [Node, 002, 003] ,is,unconnected]. Although the logs all contain the words Node, is, and unconneted, and belong to the same pattern, the positions of is and unconneted in the two logs are different, so the similarity between the two logs is only 1/5.

In Spell, the similarity is judged by LCS (maximum common subsequence):

LCS is a classic problem in computer science. Students who don’t know it can Baidu it by themselves. Spell breaks through the limitation of length by judging similarity through LCS, but it also brings about the problem of efficiency, because the algorithm for accurately solving the LCS problem The time complexity of O(mn) (m, n is the length of seq1 and seq2) is required.

The distance used by LKE is the edit distance of the text, and calculates the minimum number of words required to convert seq1 to seq2 (add, delete, modify). In addition, LKE proposes that when calculating the distance, the position of the operated word should be considered. Therefore, the article proposes a weighted edit distance:

Edit distance also breaks through the limitation of length, but time complexity is still a pain point.

When Lenma calculates the similarity, it needs to first extract the feature vector according to the log message. The feature vector used by Lenma is called the word length vector. The extraction method is very simple, that is, write down the length of each word in the log, and then splicing the lengths together. For example, in the log [Node, 001, is, unconneted], the word length vector is [4, 3, 2, 10]. The similarity calculation formula is the cosine similarity of the word length vector:

The advantage of extracting the feature vector first and then calculating the similarity is that the vector can perform matrix calculation in parallel, and the calculation efficiency will be higher in the similarity calculation module, but extracting the feature also means losing information.

The similarity formula used by SHISO is:

Among them, C(W1[i]) and C(W2[i]) are feature vectors generated from words W1[i] and W2[i]. The feature vectors considered in SHISO include uppercase letters, lowercase letters, numbers, and others. The generated vector is 4-dimensional, and each dimension is the number of uppercase letters, lowercase letters, numbers, and others. Such as the word Node, the generated feature vector is [1,3,0,0].

b. Clustering logic:

With the similarity calculation formula, clustering can be performed. One of the simplest clustering logic is to store all clusters, and when there is a log to be parsed, the log and the cluster center of all clusters (the cluster center may be a list of words, or it may be The feature vector is determined by the first log that enters the cluster, and is updated with the incoming log) to calculate the similarity one by one, and find the cluster with the largest similarity. If the similarity meets the threshold requirement, the log to be parsed is merged into the cluster, and the cluster center is updated. If there is no cluster that meets the threshold requirement, the log to be parsed is used as the cluster center, and a new cluster is created. class cluster. Both Spell and Lenma use such clustering logic.

This kind of clustering logic is very simple, but there is a problem that the similarity is calculated too many times, and the calculation efficiency is low. Therefore, more algorithms, before calculating the similarity, will first group the logs, and then use such a clustering logic for clustering within each group.

For example, the Drain algorithm uses a tree structure to group logs. Drain's grouping strategy has two parts: grouping according to the length of the log and grouping according to the first few words of the log. Drain's tree depth can be set, and the tree depth determines how many first words are used to group. The grouping strategy is shown in the figure. When there is a log to be parsed, it will search down according to the length of the log and the first few words until the leaf node. The clusters in the group are stored under the leaf nodes. After searching for the leaf nodes, the similarity is calculated, and the cluster center is updated or a new cluster is created according to the similarity calculation result.

The DAGDrain algorithm is similar to the Drain algorithm, and also uses a tree structure for grouping. There are two grouping strategies: grouping according to the length of the log and grouping according to the first or last word of the log. The length of the log is easy to understand. The method of grouping according to the first word or last word of the log is: according to whether the first word or last word of the log contains numbers and special characters, extract the first word or last word of the log, and add the first word to it. The logo of the word or the last word is used as split_token, and logs with the same split_token will be divided into the same group.

The AEL algorithm will group the logs according to the log length and the logarithm of key-value pairs, and then calculate the similarity under the group for clustering.

To sum up, there are four ways to group in advance: grouping according to the length of the log, grouping according to the first few words of the log, grouping according to the first word or last word of the log, and grouping according to the logarithm of the key-value pair. Both have their shortcomings. Grouping according to log length is greatly affected by word segmentation. Improper word segmentation may cause logs of the same pattern to have different lengths. Moreover, for some logs with multiple locations, the length will also incorrectly classify them into different groups. middle. Such as log [Node, 000, is, unconnected] and log [Node, 001, 002, is, unconnected]. Grouping by the first few words in the log is useful most of the time, but sometimes parameters may also be at the top of the log. According to the strategy of grouping the first word or the last word of the log, if the header of the log is not removed, it will not be able to exert its effect in high probability. According to the logarithmic grouping of key-value pairs, it may be affected by improper regular matching.

In addition to grouping in advance, hierarchical clustering is also a method to improve parsing efficiency. For example, SHISO, SHISO is also a tree-structured parsing algorithm. Each node of the tree corresponds to a cluster, and the algorithm requires that the child nodes of each node be must be less than the threshold t. The algorithm flow is as follows: traverse the child nodes to see if there are clusters whose similarity meets the requirements. If it exists, update the cluster center; if it does not exist, and the number of child nodes is less than the threshold, insert a new cluster under the node; if it does not exist, and the number of child nodes is equal to the threshold, find the one with the largest similarity node, continue to traverse the child nodes under this node, and iterate until a cluster whose similarity meets the requirements is found.

Logmine also mentioned that the hierarchical clustering algorithm can be used, but the purpose of his hierarchical clustering is not to improve efficiency. On the contrary, it may reduce the parsing efficiency to a certain extent, but it provides the possibility for manual and flexible processing of parsing levels. Logmine also uses a tree structure for clustering. It proposes that at the first layer of the tree, logs are clustered with a very small distance threshold to form clusters. In this way, the logs can be divided enough to avoid different patterns. The logs of the cluster are clustered together, and the cluster center of the cluster is the first log that enters the cluster, and is not updated. Starting from the second layer, with a larger distance threshold, the cluster centers of the previous layer are clustered to form clusters of cluster centers. The cluster centers are the first data to enter the cluster, and as the It is updated as new data comes in.

c. Auto Threshold:

For most of the clustering-based analysis algorithms, the similarity threshold needs to be set manually. LKE and DAGDrain provide two automatic threshold ideas. LKE proposes that the similarity threshold can be obtained through k-means clustering, while DAGDrain proposes the following automatic threshold formula:

However, the validity of the automatic threshold has yet to be verified, and will not be described here.

(2). Log pattern analysis based on frequent item mining:

The log pattern parsing algorithm based on frequent item mining needs to count the frequencies (mostly word frequencies, and Logram counts n-gram frequencies) before clustering.

SLCT is the earliest log pattern parsing algorithm. Its principle is relatively simple. The algorithm flow is as follows: If the occurrence frequency of the log word W(i) in a log is greater than the threshold s, then W(i) is considered to be a frequent word. All frequent words and their positions of the log are extracted as clustering candidates. For example, in the log [Interface,eth0,down], if the frequency of occurrence of Interface and down is greater than the threshold, the cluster candidate generated by the log is {(Interface, 1) (down, 3)}, and the number of cluster candidates is greater than Thresholds are formally called clusters.

Logram innovatively introduces n-gram into frequent item mining algorithm. Before parsing, it needs to count all the n-gram frequencies in the log (n=1,2,...k, k are setting parameters). When parsing, first obtain the k-gram frequency of the log, select the k-gram whose frequency is less than the k-gram threshold as parameter candidates, and then decompose the selected k-gram into k-1-gram, and then jump out from it Parameter candidates for k-1-grams up to 2-grams. For any word of the log, if all its 2-grams are in the 2-gram parameter candidates, the word is a parameter, otherwise it is a constant. For example, in the log [Node, 000, is, unconnected], the 2-gram of 000 has Node->000, 000->is, if Node->000, 000->is are all in the 2-gram parameter candidates, then 000 is the parameter. The log mode can be obtained by fixing the constant and converting the parameter to the special symbol <*>.

Both SLCT and Logram are very efficient algorithms, but they both have a problem: the threshold for judging frequent words (or frequent n-grams) is not easy to determine. And, in reality, there is a situation where some codes are printed very few times, but some codes are printed very many times, which means that the frequency of the constants in the logs generated by the codes that are printed a few times is also would be very low and very easy to be considered a parameter. Therefore, these two algorithms are not recommended for use.

FT-tree is also a log pattern parsing algorithm based on frequent item mining, using the original FT-tree data structure for parsing. Before parsing, the algorithm traverses all logs, and then obtains the occurrence frequency of words in the log, and sorts the words according to their occurrence frequency from large to small to form a list L. When parsing the log, for the log to be parsed, the words of the log will be sorted according to the list L, and then inserted into the FT-tree (this can make the words with high frequency closer to the root node, and the words with low frequency are further away from the root node), Until all logs are inserted as shown. Then how does FT-tree judge constants and parameters?

We consider the logs printed by the same code, such as [Node,000,is,unconnected], [Node,001,is,unconnected], [Node,002,is,unconnected], and insert these logs into the FT-tree Afterwards, words closer to the root node must be more likely to be constant, because the frequency of occurrence is higher. Moreover, we will find that when approaching the root node, the tree has almost no bifurcation, but after reaching a certain node, the number of child nodes will increase exponentially, and this node is the dividing line between constants and parameters. Through this thought experiment, we can naturally get the next operation of FT-tree - pruning. In this way, the nodes remaining in the FT-tree represent constants. Logs containing the same constants are considered to belong to the same schema by FT-tree.

The structure of the FT-tree is ingenious, but there may be some problems. For example, some parameters may be limited, which will not cause such a big bifurcation on the FT-tree, and may also remain on the FT-tree. If there is a common constant, it may also cause bifurcation under this constant, resulting in the pruning of child nodes and grouping multiple modes together.

(3). Analysis of log patterns based on heuristics

There are only two heuristic-based log pattern parsing algorithms investigated in this paper, POP and IPLOM, but in fact, many heuristic rules are also used in the clustering-based log pattern parsing algorithm, such as the grouping strategy of Drain and DAGDrain.

There are two heuristic strategies used in POP: 1. Group by log length. 2. Group by word position. The idea of ​​grouping according to word positions is somewhat similar to the idea of ​​FT-tree, which is designed based on the belief that parameters will lead to too many bifurcations. The steps are: in the group, count each position, there are several kinds of words in the group, and find the position with the least word type but not 1 for grouping. As shown in the following four logs, you can choose position 1 or position 2 for grouping, because there are only 2 kinds of words in position 1 and position 2, and there are 4 kinds of words in position 3. Keep doing this until the minimum number of non-1 word categories and the number of word categories divided by the number of logs in the group are all greater than the threshold.

There are three heuristic strategies used in IPLOM: 1. Group according to log length. 2. Group by word position. 3. Group according to the bijective relationship. Among them, the principle of heuristic strategy 3 is also that the parameters will lead to too many bifurcations, and will not be introduced here.

3. Mode fusion:

In the DAGDrain algorithm and the POP algorithm, it is mentioned that after the log patterns are obtained, the log patterns can be clustered and fused again by calculating the similarity between the log patterns. Log mode fusion is a high-leverage operation, because the amount of data that needs to be clustered for mode fusion is the number of modes, which is much smaller than the number of logs. Therefore, compared with log clustering, mode fusion takes very little time, and it can play a role in Improve the effect of clustering results.

In DAGDrain, the adopted pattern fusion similarity is:

Among them, lenNew is the length of the new pattern after fusion, lenExist is the length of the original pattern, and lenLCS is the length of the maximum common subsequence between the new pattern and the original pattern.

In POP, the mode fusion distance formula used is the Manhattan distance:

Among them, N is the common word contained in the two texts, and ai and bi respectively value the number of the i-th word in the texts a and b.

4. Small tip:

There are many log parsing algorithms, and through the introduction just now, we can find that many steps in the above algorithm can be understood as a grouping process, and grouping is a process that does not affect each other. Within the group, it is also possible to continue to group by other strategies. So we might as well be bold, there are many methods in the algorithm that can be combined with each other to form a new algorithm. For example, in the tree structure of the Drain algorithm, can a layer grouped by the number of key-value pairs be added? For another example, can the tree structure of FT-tree be regarded as a grouping process, and after grouping, clustering is performed according to the similarity calculation? For another example, after the Drain algorithm is divided into groups, can SHISO's hierarchical clustering strategy be used again to avoid excessive patterns under a certain leaf node of the Drain algorithm from affecting the efficiency? However, in the process of fusion, always pay attention to the effect and efficiency.

write at the end

In recent years, under the background of the rapid development of the AIOps field, the urgent needs of IT tools, platform capabilities, solutions, AI scenarios and available data sets have burst out in various industries. Based on this, Cloud Wisdom released the AIOps community in August 2021, aiming to build an open source banner, build an active user and developer community for customers, users, researchers and developers in various industries, and jointly contribute to and solve the industry. problems and promote technological development in this field.

The community has open sourced the data visualization orchestration platform-FlyFish, the operation and maintenance management platform OMP , the cloud service management platform-Moore platform, Hours algorithm and other products.

Visual Orchestration Platform-FlyFish:

Project introduction: https://www.cloudwise.ai/flyFish.html

Github address: https://github.com/CloudWise-OpenSource/FlyFish

Gitee address: https://gitee.com/CloudWise/fly-fish

Industry case: https://www.bilibili.com/video/BV1z44y1n77Y/

Some large screen cases:

Please get to know us through the link above, add a little assistant (xiaoyuerwie) note: flying fish. Join the developer exchange group and have 1V1 exchanges with big names in the industry!

You can also obtain cloud intelligence AIOps information through the little assistant, and learn about the latest progress of cloud intelligence FlyFish!

{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/yunzhihui/blog/5514043
log
log