elkb practical experience, and then presented a complex set of configuration files

Original: little sister the taste (micro-channel public number ID: xjjdog), please share, reproduced Please keep the source.

Bao Jianfeng from sharpening out, plum blossom from the bitter cold.

Poet, in March in the south, along the road to see the opening of peach, emotionally, to make the song popular at future generations of poetry. The poet yearning for the good things, and the plight of the emotion behind it. Well ... I made not go on.

Fortunately, the first phrase is a causal relationship. But the last part of the poem, but shit unreasonable, typical of 从结果找原因ideas. Even to "dog feces smell coming from chrysanthemum", but also more open than this.

This is the difference between theory and practice, rely on imagination, can not afford to stay skinny is to be reality.

This article will introduce a common, bad street elkb 方案, and comes with a fine profile, in order to reduce duplication of effort.

ELKkB

Not long ago, elkbjust call elk. beatsSeries in recent years that it has developed, the purpose is to replace flumeand other collection component. But in order to make this process much smoother scalable, usually adding a component named kafka's. So overall looks like this.

Comments about a few simple components.

1) filebeat. Assembly for collecting logs, which have been tested using a simple, less resource intensive than the flume. But the occupation of resources is not so smart, you need to adjust some parameters. filebeat will also consume memory and cpu resources, need to be careful.

2) kafka. Popular message queue, a memory buffer + log collection function inside. kafka topic too much, there will be serious performance problems, so it is necessary to collect classified information. Furthermore, direct kafka divided into different clusters. kafka on cpu less demanding, large memory and high-speed disk will significantly increase its performance.

3) logstash. Mainly for shaping and filtering data. This component is very greedy, resource intensive, and do not put an application process. However, it is regarded as a stateless compute nodes, can at any time according to need expansion.

4) elasticsearch. It can be very large storage capacity of log data. Note that a single index is not too large, can be a daily or monthly index based on the magnitude of the index, while easy to delete.

5) kibana. Es integration and a very good display components, for which, xjjdog there is a dedicated article. "You wildflowers, I of kibana"

The more components selected, the whole process will be more elegant. Especially kafka join, will make the whole chain of head and buttocks have become perfectly interchangeable, more magical. A way to qualify: ELK-> ELKB-> ELKkB.

Practice Journey

Log Format

To these series our components, we need some little data. Which, nginx logs are the most common, it has become the default http load balancer service.

First, it needs to conduct some regular log format, there is a more useful configuration me.

log_format  main 
'$time_iso8601|$hostname|$remote_addr|$upstream_addr|$request_time|'
'$upstream_response_time|$upstream_connect_time|$status|$upstream_status|'
'$bytes_sent|$remote_user|$uri|$query_string|$http_user_agent|$http_referer|$scheme|'
'$request_method|$http_x_forwarded_for' ;

access_log logs/access.log main;
复制代码

In the end, the resulting log might look like this president, content is still relatively wide. This log format, whether it is to deal with the program, or the use of script processing, are more convenient.

2019-11-28T11:26:24+08:00|nginx100.server.ops.pro.dc|101.116.237.77|10.32.135.2:41015|0.062|0.060|0.000|200|200|13701|-|/api/exec|v=10&token=H8DH9Snx9877SDER5627|-|-|http|POST|112.40.255.152
复制代码

collector

Next, you need to configure filebeat components. Also mentioned above, because this thing is deployed on a business machine, it requires strict control of its resources. Complete configuration files are available in the annex.

Such as cpu resource constraints.

max_procs: 1
复制代码

Memory resource limits.

queue.spool:
  file:
    path: "${path.data}/spool.dat"
    size: 512MiB
    page_size: 32KiB
  write:
    buffer_size: 10MiB
    flush.timeout: 5s
    flush.events: 1024
复制代码

Also, you can add some additional fields.

fields:
  env: pro
复制代码

Next you need to configure kafka. Since the log of the order is generally greater, and no obvious meaning, so the number of copies of more than 2, not useful, but will increase the time of recovery.

filter

logstash configuration is probably the most confusing place, and this is our main point of introduction. Since the above nginx log will be parsed into elasticsearch json string can be identified.

Through the input portion, you may access some of the data sources. Here, our data source into a kafka. If you have multiple kakfa, or multiple data sources, can be defined here.

Then, the filter portion can be defined cleaning operation of some data. Here has a very boring grammar, with a very bad api, especially the date of processing. If the code is not formatted, nested levels will make people feel dizzy. It is said that using the ruby ​​syntax.

Note, event is a built-in variable represents the current row of data, including some basic attributes. You can get some value by the get method.

For example, the main and most important body of information, which is specific line information.

body = event.get('message')
复制代码

Then, it is parsed into respective key / value value. The separator |is what we nginx logs delimiter. Is there an out vigorous impulse?

reqhash = Hash[@mymapper.zip(message.split('|'))]

query_string = reqhash['query_string']

reqhash.delete('query_string')
复制代码

Date handling is heartbreaking journey.

time_local = event.get('time_local')
datetime = DateTime.strptime(time_local,'%Y-%m-%dT%H:%M:%S%z')
source_timestamp = datetime.to_time.to_i * 1000
source_date = datetime.strftime('%Y-%m-%d')      event.set('source_timestamp',source_timestamp)
event.set('source_date',source_date)
复制代码

If you want to parse query param, this is there, but still relatively around.

query_string = reqhash['query_string']
query_param = CGI.parse(query_string)
query_param.each { |key,value| query_param[key]=value.join() }
reqhash['query_param'] = query_param
buffer_map = LogStash::Event.new(reqhash)
event.append(buffer_map)
复制代码

So many bizarre function, is come from it? logstash will not tell you that I was from ruby official investigation. It may be Land bothered to talk to us.

https://ruby-doc.org/core-2.5.1/
复制代码

If you log format defined weird, or nesting level deeper, to pay attention. Resolution is destined to be a trip from the city.

But there is a logstash output, called the stdout, real-time debugging the process, you need to visually judge the results. This comparative test of the success of the one-time programming capability.

End

So, I share these configuration files, so that when you are in the implementation. It can be referenced. Warehouse addresses can also click to see the original:

https://github.com/xjjdog/elkb.cnf.pro
复制代码

If you're something of theoretical is not very familiar with, there are also a number of articles have been completed. Including the whole idea, as well as the nature of the investigation. "So many monitoring component, there is always a right for you," "your flowers, I of kibana"

In fact elkb implementation, the difficulty is not on the program, but rather integrate it. The first application of this bypass, is not able to affect the normal operation of the existing service, that is not overwhelming; second, the general play, the number of machines is particularly large, how to deploy, update, is the need for a Fan study.

But that part of the filter, whether flume, or logstash, all around, but the ridge.

About the Author: little sister the taste (xjjdog), it does not allow the public a number of detours programmer. Focus infrastructure and Linux. Decade architecture, day ten billion flow, and you explore the world of high concurrency, give you a different taste. My personal micro-channel xjjdog0, welcome to add friends, further exchanges.

Guess you like

Origin juejin.im/post/5de8c622f265da33ce455f5b