Improve the efficiency of IT operation and maintenance, in-depth interpretation of JD Cloud's operation and maintenance log anomaly detection AIOps implementation practice based on natural language processing

Author: Jingdong Technology Zhang Xianbo, Zhang Jing, Li Dongjiang

Cluster operation and maintenance logs based on NLP technology to quickly discover online business problems from the perspective of logs

Logs are widely used in the IT industry, and anomaly detection of logs is crucial to identify the operating status of the system. Traditional approaches to tackle this problem require complex rule-based supervised methods and a large human time cost. We propose an anomaly detection model for operation and maintenance logs based on natural language processing technology. In order to improve the quality of the log template vector, we improved the feature extraction. The part of speech (PoS) and named entity recognition (NER) technologies were used in the model to reduce the participation of rules. The template vector was modified by using the weight vector of NER to analyze the log The PoS attribute of each word in the template reduces the cost of manual labeling and contributes to better weight distribution. In order to modify the template vector, a method of labeling weights to the log template is introduced, and the final detection based on the template modified vector is realized by using a deep neural network (DNN). Our model is tested for effectiveness on three datasets and compared with two state-of-the-art models, and the evaluation results show that our model achieves higher accuracy.

The log is one of the main methods to record the operating status in the IT field such as the operating system, and is an important resource to identify whether the system is in a healthy state. Therefore, it is very important to make accurate anomaly detection on logs. There are generally three types of log anomalies, namely, abnormal individual logs, abnormal log sequences, and abnormal log quantitative relationships. We mainly identify abnormal individual logs, that is, logs containing abnormal information.

In general, anomaly detection of logs consists of three steps: log parsing, feature extraction, and anomaly detection. Templates extracted by parsing tools are text data that should be converted to numeric data for easy input into the model. For this, feature extraction is necessary to obtain a digital representation of the template. In terms of template feature extraction, the industry has proposed a variety of methods to accomplish this task. One-hot encoding is one of the earliest and easiest methods to easily convert a text template into a manipulable numerical representation, but one-hot encoding is a less efficient encoding method that takes up too much storage space to Forms a vector of zeros, and ignores the semantic information of the log template when using one-hot encoding. In addition to this convenient encoding method, more and more researchers apply natural language processing (NLP) techniques to achieve digital transformation of text, including methods such as bag of words, word2vec, etc. Although the above methods can achieve the conversion from text data to numerical data, there are still some shortcomings in log anomaly detection. Bag-of-words and word2vec can efficiently obtain word vectors by considering the semantic information of templates, but they lack the ability to consider the importance adjustment of each template word appearing in the template. In addition, deep neural network (DNN) is also used for feature extraction of templates.

Our model mainly improves feature extraction while considering the template word semantic information and weight assignment of each tag, because the tag results have different importance to the final detection. We utilize two natural language processing techniques, namely PoS and named entity recognition (NER), to achieve template feature extraction through the following steps. Specifically, the original log message is first parsed into a log template by FT-Tree, and then the template is processed by a PoS tool to obtain the PoS attribute of each word in the template for weight vector calculation. At the same time, the tokens in the template are vectorized into initial template vectors by word2vec, and the initial template vectors are further modified by using the weight vector. The PoS attributes of those important template words will help the model to better understand the log meaning. For the template words marked with PoS attributes, the importance of the words to abnormal information recognition is different. We use NER to find out the template words with high importance in the PoS attributes of the template, and the template words recognized as important by NER will get Greater weight. Then, the initial template vector is multiplied by this weight vector to generate a composite template vector, which is input into the DNN model to obtain the final anomaly detection result. In order to reduce the human input on log parsing and prepare for weight calculation, we adopt the PoS analysis method, and mark a PoS attribute for each template word without introducing template extraction rules.

The feature extraction process of parsing templates is an important step in anomaly detection. The main purpose of feature extraction is to convert text format templates into digital vectors. Various template feature extraction methods have been proposed in the industry:

One-hot encoding : In DeepLog, each input log template from a set of k templates ti, i ∈ [0, k) is encoded as a one-hot encoding. In this case, a sparse k-dimensional vector V = [v0,v1,...,vk-1] is constructed for the important information ti of the log, and j is not equal to i, j∈[0,k) , such that for all vi = 1 and vj = 0.

**Natural Language Processing (NLP):** In order to extract the semantic information of log templates and convert them into high-dimensional vectors, LogRobust uses the ready-made Fast-Text algorithm to extract semantic information from English vocabulary, which can effectively capture the Intrinsic relationship between words (ie, semantic similarity), and each word is mapped to a k-dimensional vector. Various models using NLP techniques are also used by most of the industry, such as word2vec and bag-of-words.

**Deep Neural Network (DNN):** Unlike natural language processing (NLP) that uses fine-grained units like word2vec or Fast-Text, LogCNN generates log embeddings based on a 29x128 codebook, which is a trainable layer that is used throughout Gradient descent is used for optimization during training.

**Template2Vec:** is a new approach to efficiently represent words in templates based on synonyms and antonyms. In LogClass, the classic weighting method TF-IDF is improved to TF-ILF, and the inverse location frequency is used instead of the inverse document frequency to realize the feature construction of the template.

A raw log message is a semi-structured text, such as an error log collected from an online payment application that reads: HttpUtil-request Connection failed, Read timeout at jave.net. It usually consists of two parts, variables and constants (also known as templates). For anomaly detection that identifies individual logs, the purpose is to identify whether there is anomalous information from the template parsed from the original log. Our model uses PoS analysis along with NER techniques for more accurate and labor-saving log anomaly detection. PoS helps to filter template words marked with unnecessary PoS attributes, and the goal of NER is to assign importance to all template words marked as important PoS attributes. Then the composite template vector is obtained by multiplying the template vector and the weight vector.

Our log anomaly detection model consists of six steps, namely template parsing, PoS analysis, initial vector construction, NER-based weight calculation, composite vector and final detection. The whole process of detection is shown in Figure 1:

Step 1: Template Parsing

Initial logs are semi-structured text that contain unnecessary information that may confuse or hinder log detection. Therefore, preprocessing is required to omit variables, such as some numbers or symbols, and extract constants, i.e. templates. Take the aforementioned log message as an example, the original log HttpUtil-request failed to connect to the template of [wx/v1/pay/prepay], Read timeout at jave.net. Can be extracted as: HttpUtil request connection *failed read time*. We use a simple and effective method FT-Tree to implement log parsing, and we do not introduce complex rule-based rules to remove those less important tokens, such as stop words.

Step 2: PoS Analysis

As a result of the template parsing in the previous step, only English words, phrases, and some non-native words remain in the parsed template, and these template words have various PoS attributes, such as VB and NN. Based on our observations on a large number of log templates, some PoS properties are important for the model to understand the meaning conveyed by the template, while other properties can be ignored. As shown in Figure 3, the word "at" in the parsing template is theoretically unnecessary, and the corresponding PoS attribute "IN" is also unnecessary. Even if the IN mark is removed, we can still judge whether the template is normal. Therefore, after we get the PoS vectors, we can simplify the template by removing those template words with specific PoS properties. The remaining template words are very important for the model to better understand the template content.

Step 3: Initial template vector construction

While obtaining the PoS vector, the template is also encoded into a digital vector. In order to consider the semantic information of the template, word2vec is used in the model to construct the initial vector of the template. This initial vector will be multiplied with the weight vector obtained in the next step to obtain a composite optimized representation of the template.

Step Four: Weight Analysis

Firstly, PoS analysis is performed on the template words in the template, and meaningless template words are eliminated. As for the rest of the template words, some are critical and are used to convey basic information such as server operation, health status, etc. Others may be less important information, such as the object of the action, the warning level, and so on. In order to strengthen the model's learning of these important template words, we constructed a weight vector to highlight these important template words. To this end, we employ NER techniques to learn to pick all template words marked as significant entities by feeding them defined significant entities. The process is shown in the figure:

CRF is a tool commonly used by NER, and it is also used in our model to identify the importance of template words. That is, by providing the model with template words marked as important, the model can learn to identify important template words for those unlabeled logs. Once the template word in the template is recognized by CRF, the corresponding position will be given a weight value (2.0). Therefore, we get a weight vector W.

Step 5: Composite Vectors

After obtaining the weight vector W, by multiplying the initial vector V' by the weight vector W, a composite optimization vector V representing the template can be obtained. Important template words are assigned greater weights, while other template words are assigned smaller weights.

Step 6: Anomaly Detection

The composite vector v obtained in the fifth step is input into the final fully connected layer for anomaly detection. The output of a fully connected layer is 0 or 1, indicating normal or abnormal, respectively.

Model evaluation

We experimentally validate the improved effect of the model on log anomaly detection. Two public datasets, as well as a set of our internal datasets, are used to verify the practicality of our model. We compare our results with two industry-proposed Deeplog and LogClass models for log anomaly detection.

The framework of CANet is built with PyTorch, and we choose Singapore Stochastic Gradient Descent (SGD) as the optimizer in 35 training epochs. The learning rate is set to 2e4. All hyperparameters are trained from scratch.

**(1) Datasets: **We selected two sets of public datasets and one set of internal datasets for model evaluation. BGL and HDFS are two commonly used public datasets for log analysis: **HDFS: Yes Collected from 200+ Amazon EC2 nodes running Hadoop-based jobs. It consists of 11,175,629 raw log messages, 16,838 of which were marked as "abnormal" . BGL: **Collected from the BlueGene/L supercomputer system, containing 4,747,963 raw log messages, of which 348,469 are exception logs. Each log message is manually marked as abnormal or normal. **Dataset A:** is a dataset collected from within our company for practical validation. It contains 915,577 raw log messages and 210,172 manually marked exception logs.

**(2) base model:** We compare our model with two of the most advanced models in the industry (DeepLog and LogClass) on three data sets: DeepLog: is a model based on a deep neural network, using Long Short-Term Memory (LSTM) to achieve detection. DeepLog employs one-shot encoding as a template vectorization method. LogClass: LogClass proposes a new approach - Inverse Localization Frequency (ILF) to weight log literals in feature construction. This new weighting method is different from the existing Inverse Document Frequency (IDF) weighting method.

**(3) Model evaluation results:** We evaluated the anomaly detection effects of the two base models and our model from the three aspects of Precision, Recall and F1-score. On the HDFS dataset, our model obtained the highest The F1 score is 0.981, moreover, our model also performs the best in terms of recall. LogClass achieves the best results on Precision, slightly higher than ours. On the second dataset, BGL, our model performed best in terms of Recall (0.991) and F1-score (0.986), but slightly lower than LogClass in Precision. In the performance of the three models on the third dataset A, our model achieves the best performance, followed by LogClass.

In all datasets, our model has the best F1 score and highest recall, which means our model causes less uncertainty.

•Natural Language Processing-based Model for Log Anomaly Detection. SEAI.

•**ieeexplore retrieval:** https://ieeexplore.ieee.org/abstract/document/9680175

Themis intelligent operation and maintenance platform intelligent text analysis function view: ( http://jdtops.jd.com/ )

{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/4090830/blog/6814558