"DNA" for log collection

Original: Miss Smell (WeChat public account ID: xjjdog), welcome to share, please reprint, please keep the source.

Xjjdog has written quite a few articles about log collection, such as the following eight articles. Today I mainly introduce the division of logs. Although the tool is powerful, it can only be effective when it is landed.

[1] There are so many monitoring components, there is always a suitable for you
[2] elkb practical experience, and give away a complex set of configuration files
[3] the prometheus that used to teach humans to use fire, and now they are working hard to alarm
[4] your wildflowers, My kibana
[5] 2w long text, let you instantly have "call chain" development experience [6] In this round, skywalking won
[7] unpopular instrument package, function d fried sky [8] microservices
are not all, just Domain-specific subset

Log collection is a basic component that every company needs, especially companies that are on track. But what does the log collection want to collect? Should we treat this information equally?

Log categories

Generally speaking when it comes to logs, the back-end logs come to mind. However, according to different needs and log levels of the back-end logs, the final flow direction and processing method are also different.

General business log. You must know that in this world, programmers running online with DEBUG log level are everywhere. There is no need to collect those running accounts like pee. In other words, most of the business logs are useless. To treat such data, we only need one place for unified storage.

Retrievable business logs. The retrieved business logs have business attributes. For example, the 报文交互data generated by the connection between your system and the third-party payment . They are more useful than ordinary business logs, but they are not necessary to be stored in the database. Our general processing method is to collect them in large-capacity storage such as ES.

It's not that you have collected ES, just hang kibanait and it's done. We also need to retrieve this information, that is, the fields must have a specific meaning. At this time, 普通字符串it is useless and needs to be converted into jsona class of specification data, so that you can search statistics according to a certain condition.

ES and mongo also support this.

Retrieving business logs is the focus of construction, and the secondary development and customization of log output components are required to cooperate and complete.

The following is a possible appearance interface.

//输出携带参数的日志,参数为偶数,将会对其进行key,value配对。
LogMe.out("title","remark aa", "vendorid", 5, "storecode", "1011", "poscode", "POS1111", "version", "7.0.0.16");

//参数为奇数,放入_all字段,无法根据内容查找(要尽量避免此情况)
LogMe.out("test _all title","remark aa", "vendorid", 5, "storecode", "1011", "poscode", "POS1111", "version");

//手工组装参数(参数非常多时,建议此方式)
Map<String, Object> param = new HashMap<>();
param.put("vendorid", 5);
param.put("madetime", new Date());
param.put("orderno", 21731310830180019L);
LogMe.out("test map","remarkaa", param);

//error堆栈+参数,以上两个方法都可以追加异常栈
LogMe.out("error","remark error", new Exception("error"), "vendorid", "5", "storecode", "1011");
复制代码

Exception log. The exception log is another flow. Treating this type of information, we hope to get two effects. First, the abnormal log can be discovered by business personnel in a timely manner; second, the abnormal log can be analyzed and analyzed afterwards. Therefore, a triggered log processing chain, as well as retrieval-type contextual queries, are necessary.

APM is integrated with the front end and terminal, and can perform call chain tracking and behavior analysis. It is generally a holistic analysis of the end. There are many such products on the market, including paid and open source.

Further up, there are logs of some terminals. Terminals include Android, IOS, and other handheld devices. It is similar to the WEB side, but the tool chain is different.

Behavior log. When you use some apps, you will check the previous 匿名发送使用数据-帮助我们提高option by default . The most detailed behavior data records, each time the user clicks an event, a log will be generated, and these logs will be sent to the server for analysis. The data of this kind of log is generally very large, and it needs to be specially processed and TSDBstored using super large-capacity storage.

Terminal exception logs Terminal exception logs are generally a technical activity. In addition to collecting exceptions generated during the normal operation of the application, you also need to obtain exception information when the application exits abnormally.

It can be seen that each type of log has its own usage scenario, and the technology stack used by the back end is also different.

What is it for?

After the back-end logs are collected, most of them are used to assist development or operation and maintenance to locate problems and reduce the time for analyzing problems.

We focus on client log collection.

In addition to the unscrupulous App that secretly uses your hardware for mining, there is also a big factory software like Alipay that secretly takes your photos and records your voice (search it yourself, I think it is true). The purpose is to collect user data and engage in activities similar to big data. Like the iPhone itself, it has similar functions. The less virtuous ones are ticked by default.

Most users are unaware of their own behavior data, but when a large amount of data is aggregated, manufacturers will feel very fragrant and very emotional. Of course, the technique is to achieve it. It is not necessary to condemn your conscience. The sky is falling and the cannon fodder of the PR department is on it.

Unlike APM, a tool used to improve call relationships and performance, the data collected by the client is more fragmented and the business model is more diverse.

The user's data is so precious, so what is collected? How is it collected? Of course not by collecting questionnaires. Every click of the user, even the residence time of the page, may become the object of analysis.

Since the user installs your software, information about the hardware environment can also be obtained, including the data taken by the device that invokes the hardware, including user privacy, screenshots, audio, video, etc. So the data collected is diverse.

1. Hardware information

This is more obvious on Android devices. Collect this data to analyze the relationship between the app and the device, device version, language, etc. The focus of work can be shifted to devices and versions with high market share. You may also collect information such as the CPU, memory, graphics card, etc. of the device in order to optimize your product specifically.

2. Software environment

Collect information about own software Software version. By analyzing the number of each software version installed by the user, you can decide which versions may or may not be maintained, which versions have more bugs, etc., which is the basis for many of your decisions.

Collecting information from other software A more obvious example is cookies. For example, I searched for inflatable dolls on Baidu, but when I opened Jingdong, I recommended sex toys.

3. Function monitoring

Monitor some grayscale or important functions. For example, if you have a new idea online, you need to verify it with online data. By analyzing the logs, you can determine which product managers' ideas are very bad.

4、LBS

User location data is very sensitive. LBS has achieved Momo. For most applications, location data can analyze the popularity and differentiation of products in a certain region, province, and country.

When you release a new product requirement, you should consider the later data tracking in order to evaluate the effect of your requirements. Including the user's installed and uninstalled volume during the release. This is supported by data, not determined by patting the head.

5. Behavioral data

A few years ago, it was still a relatively popular recommendation function, and now with the support of deep learning, the analysis is more accurate. The machine will silently record your preferences in the background and produce the corresponding output to meet your preferences.

End

In summary, there are still some challenges in designing a more comprehensive logging system. Different types of business logs have different analysis and processing methods, and the final flow of data is also different. If you want to know more about this field, you can refer to the articles listed at the beginning of the article. xjjdog will try to summarize this system in terms of log specifications.

About the author: Miss Taste (xjjdog), a public account that does not allow programmers to take detours. Focus on infrastructure and Linux. The ten-year architecture and the daily flow of tens of billions will discuss the world of high concurrency with you and give you a different taste. My personal WeChat xjjdog0, welcome to add friends for further communication.

Guess you like

Origin juejin.im/post/5e93b6c3e51d4546f5790887