ELK_ incurable diseases treatment

A, ELK practical knowledge summary

1, encoding conversion issues

The problem, mainly Chinese garbled.

The input codec => plain transcoding:

codec => plain {
charset => "GB2312"
}

The GB2312 text encoding, converted to UTF-8 encoding of.

Encoding conversion can also be achieved in filebeat in (recommended):

filebeat.prospectors:

- input_type: log

paths:

- c:UsersAdministratorDesktopperformanceTrace.txt

encoding: GB2312

2, delete the extra row redundant log

logstash filter in drop deleted:

if ([message] = ~ " ^ 20 * -... task request, *, start time *") {# n with the need to delete the extra rows drop} { }



Log example:

2018-03-20 10:44:01,523 [33]DEBUG Debug - task request,task Id:1cbb72f1-a5ea-4e73-957c-6d20e9e12a7a,start time:2018-03-20 10:43:59 #需删除的行

-- Request String :

{"UserName":"15046699923","Pwd":"ZYjyh727","DeviceType":2,"DeviceId":"PC-20170525SADY","EquipmentNo":null,"SSID":"pc","RegisterPhones":null,"AppKey":"ab09d78e3b2c40b789ddfc81674bc24deac","Version":"2.0.5.3"} -- End

-- Response String :

{"ErrorCode":0,"Success":true,"ErrorMsg":null,"Result":null,"WaitInterval":30} -- End

3, grok handle many different log line

Log example:

2018-03-20 10:44:01,523 [33]DEBUG Debug - task request,task Id:1cbb72f1-a5ea-4e73-957c-6d20e9e12a7a,start time:2018-03-20 10:43:59

-- Request String :

{"UserName":"15046699923","Pwd":"ZYjyh727","DeviceType":2,"DeviceId":"PC-20170525SADY","EquipmentNo":null,"SSID":"pc","RegisterPhones":null,"AppKey":"ab09d78e3b2c40b789ddfc81674bc24deac","Version":"2.0.5.3"} -- End

-- Response String :

{"ErrorCode":0,"Success":true,"ErrorMsg":null,"Result":null,"WaitInterval":30} -- End

In the grok logstash filter line were treated 3:

match => {

"message" => "^20.*- task request,.*,start time:%{TIMESTAMP_ISO8601:RequestTime}"

match => {

"message" => "^-- Request String : {"UserName":"%{NUMBER:UserName:int}","Pwd":"(?<Pwd>.*)","DeviceType":%{NUMBER:DeviceType:int},"DeviceId":"(?<DeviceId>.*)","EquipmentNo":(?<EquipmentNo>.*),"SSID":(?<SSID>.*),"RegisterPhones":(?<RegisterPhones>.*),"AppKey":"(?<AppKey>.*)","Version":"(?<Version>.*)"} -- End.*"

}

match => {

"message" => "^-- Response String : {"ErrorCode":%{NUMBER:ErrorCode:int},"Success":(?<Success>[a-z]*),"ErrorMsg":(?<ErrorMsg>.*),"Result":(?<Result>.*),"WaitInterval":%{NUMBER:WaitInterval:int}} -- End.*"
}
... 等多行

4, log multi-line consolidation treatment -multiline plug (Key)

Example:

① Log

2018-03-20 10:44:01,523 [33]DEBUG Debug - task request,task Id:1cbb72f1-a5ea-4e73-957c-6d20e9e12a7a,start time:2018-03-20 10:43:59

-- Request String :

{"UserName":"15046699923","Pwd":"ZYjyh727","DeviceType":2,"DeviceId":"PC-20170525SADY","EquipmentNo":null,"SSID":"pc","RegisterPhones":null,"AppKey":"ab09d78e3b2c40b789ddfc81674bc24deac","Version":"2.0.5.3"} -- End

-- Response String :

{"ErrorCode":0,"Success":true,"ErrorMsg":null,"Result":null,"WaitInterval":30} -- End

②logstash grok treatment of multiple lines after the merger. The combined multiple lines follow the same, as follows:

filter {

 grok {

    match => {

    "message" => "^%{TIMESTAMP_ISO8601:InsertTime} .*- task request,.*,start time:%{TIMESTAMP_ISO8601:RequestTime}
-- Request String : {"UserName":"%{NUMBER:UserName:int}","Pwd":"(?<Pwd>.*)","DeviceType":%{NUMBER:DeviceType:int},"DeviceId":"(?<DeviceId>.*)","EquipmentNo":(?<EquipmentNo>.*),"SSID":(?<SSID>.*),"RegisterPhones":(?<RegisterPhones>.*),"AppKey":"(?<AppKey>.*)","Version":"(?<Version>.*)"} -- End
-- Response String : {"ErrorCode":%{NUMBER:ErrorCode:int},"Success":(?<Success>[a-z]*),"ErrorMsg":(?<ErrorMsg>.*),"Result":(?<Result>.*),"WaitInterval":%{NUMBER:WaitInterval:int}} -- End"

      }

      }

}

Use multiline plug-in filebeat in (recommended):

① Introduction multiline

  • pattern: regular matching from which are consolidated;

  • negate: true / false, pattern matching portion to begin to merge, or to merge unworthy.

  • match: after / before (their understanding required)

  • after: the matched pattern portions were combined, Note: this situation is not the last line of the log matching processing;

  • before: match to merge (recommended) before the pattern section.

After ②5.5 version (before, for example)

filebeat.prospectors:

- input_type: log

paths:

- /root/performanceTrace*

fields:

type: zidonghualog

multiline.pattern: '.*"WaitInterval":.*-- End'

multiline.negate: true

multiline.match: before

Before ③5.5 version (after, for example)

filebeat.prospectors:

- input_type: log

paths:

- /root/performanceTrace*

input_type: log

multiline:

pattern: '^20.*'

negate: true

match: after

Use multiline plug-in logstash input in (not recommended for filebeat):

① Introduction multiline

  • pattern: regular matching from which are consolidated;

  • negate: true / false, pattern matching portion to begin to merge, or to merge unworthy.

  • what: previous / next (to be his own understanding)

  • previous: after filebeat the equivalent;

  • next: the equivalent before filebeat.

② Usage

input {

    file {

        path => ["/root/logs/log2"]

        start_position => "beginning"

        codec => multiline {

            pattern => "^20.*"

            negate => true

            what => "previous"

}

}

}

Use multiline plug-in in logstash filter (not recommended):

Not recommended reasons:

  • After the filter is provided multiline, pipline worker will be automatically reduced to 1;

  • The official version 5.5 in addition to the multiline, you want to use, then you need to download, download command is as follows:

/usr/share/logstash/bin/logstash-plugin install logstash-filter-multiline

Example:

filter {

  multiline {

  pattern => "^20.*"

  negate => true

  what => "previous"

  }

}

5, logstash filter date used in

Log example:

2018-03-20 10:44:01 [33]DEBUG Debug - task request,task Id:1cbb72f1-a5ea-4e73-957c-6d20e9e12a7a,start time:2018-03-20 10:43:59

Use date:

date {

      match => ["InsertTime","YYYY-MM-dd HH:mm:ss "]

      remove_field => "InsertTime"

}

注:match => ["timestamp" ,"dd/MMM/YYYY HⓂ️s Z"]

Matching this field, the field's format: / Month month / yy day / minute / second time zone, can also be written as: match => [ "timestamp", "ISO8601"] (recommended)

date Introduction:

The key is to match the log in time replace @timestamp time, because time is @timestamp log to logstash time, not log in real time.

6, the multi-class classification processing log (focus)

Add type classification in filebeat configuration:

filebeat:

prospectors:

-

paths:

#- /mnt/data/WebApiDebugLog.txt*

- /mnt/data_total/WebApiDebugLog.txt*

fields:

type: WebApiDebugLog_total

-

paths:

- /mnt/data_request/WebApiDebugLog.txt*

#- /mnt/data/WebApiDebugLog.txt*

fields:

type: WebApiDebugLog_request

-

paths:

- /mnt/data_report/WebApiDebugLog.txt*

#- /mnt/data/WebApiDebugLog.txt*

fields:

type: WebApiDebugLog_report

If used in logstash filter may be different for different classes process:

filter {

        if [fields][type] == "WebApiDebugLog_request" {                 #对request 类日志

      if ([message] =~ "^20.*- task report,.*,start time.*") {                 #删除report 行

        drop {}

              }

          grok {

          match => {"... ..."}

            }

}

In logstash output in use if:

if [fields][type] == "WebApiDebugLog_total" {

      elasticsearch {

          hosts => ["6.6.6.6:9200"]

          index => "logstashl-WebApiDebugLog_total-%{+YYYY.MM.dd}"

          document_type => "WebApiDebugLog_total_logs"

}

Second, the optimization of the overall performance of ELK

1, Performance Analysis

Server Hardware Linux: 1cpu4GRAM

Assuming that each log 250Byte.

analysis:

①logstash-Linux:1cpu 4GRAM

  • 500 per log;

  • 660 ruby ​​removed per log;

  • Removing the second data 1000 grok.

②filebeat-Linux:1cpu 4GRAM

  • Of data per second, 2500-3500;

  • Each machine can be processed per day: 24h 60min 60sec * 3000 * 250Byte = 64,800,000,000Bytes, about 64G.

bottleneck in logstash taken from data stored Redis ES, open a logstash, around 6000 per second data processing; open two logstash, about 10,000 data processing (cpu basically run over) per second;

logstash boot process takes considerable system resources, because the script to check java, ruby and other environmental variables, start resource consumption will return to normal.

2 choose how to collect logs: logstash / filter

Filebeat not require the use of the principle or logstash, both as a function of the same shipper.

The difference is that:

  • logstash By integrating a number of plug-ins, such as grok, ruby, compared beat heavyweight;

  • After logstash started occupying more resources, hardware resources are sufficient if you do not need to consider both the differences;

  • logstash based JVM, cross-platform support; and beat using golang writing, AIX is not supported;

  • You need to install on AIX 64bit platform jdk (jre) 1.7 32bit, 64bit is not supported;

  • filebeat may be directly input to the ES, but the present system ES logstash input directly to the case, which would result in different types of indexes to retrieve caused complex, preferably a unified input to the source els.

to sum up:

logstash / filter conclusion is different, but I recommend options: Configuration filebeat log on each server needs to be collected, because lightweight for collecting logs; reunification output to logstash, do the handling of logs; and finally unified by the logstash output to els.

3, logstash optimized configuration

You can optimize the parameters can be optimized according to your hardware configuration:

①pipeline number of threads, the official advice is equal to the number of CPU cores

  • The default configuration ---> pipeline.workers: 2;

  • Can be optimized for the ---> pipeline.workers: CPU core number (or the number of times the CPU cores).

The number of threads when the actual output ②

  • The default configuration ---> pipeline.output.workers: 1;

  • Can be optimized for ---> pipeline.output.workers: pipeline does not exceed the number of threads.

③ sent each time the number of events

  • The default configuration ---> pipeline.batch.size: 125;

  • Can be optimized for ---> pipeline.batch.size: 1000.

④ sending delay

  • The default configuration ---> pipeline.batch.delay: 5;

  • Can be optimized for ---> pipeline.batch.size: 10.

to sum up:

-W parameters by setting the number of pipeline worker, can directly modify the configuration files logstash.yml. This will increase the number of threads and output filter, if necessary, be set to several times the number of cpu core is safe, the thread is idle on the I / O.

Each output on a default worker thread pipeline activity, workers may be provided in the output set output, the set value is not greater than the number of pipeline worker.

Batch_size can also set the number of output, for example outputs ES consistent batch size.

After the filter set multiline, pipline worker will automatically will be 1, if filebeat, recommendations regarding the use multiline in the beat, if you use logstash as a shipper, is recommended to set the multiline input, do not set the multiline in the filter.

Logstash the JVM configuration file:

Logstash is a Java-based development programs need to run in the JVM, it can be set for the JVM by configuring jvm.options. For example, the maximum and minimum memory, garbage collection mechanism and so on. JVM memory allocation can not be too big not too small, too slow Assembly operating system. Too small to lead to not start. The default is as follows:

  • Xms256m # minimal use of memory;

  • Xmx1g # maximum use of memory.

4, issues related to the introduction of Redis

filebeat can be entered directly into logstash (indexer), but logstash no storage function, if you need to restart all connected to the need to stop the beat, then stop logstash, operation and maintenance caused trouble; in addition, data is lost if an exception occurs logstash; introduced as Redis data buffer pool, when the abnormal stop logstash can be seen from the data cache in the client Redis the Redis;

Redis can use the list (can be up to 4,294,967,295 bars) or publish subscribe storage mode;

Redis do ELK optimized buffer queue:

  • Do not bind 0.0.0.0 # listens local port;

  • requirepass ilinux.io # add password for safe operation;

  • Only queue, persistent storage is not necessary, turn off all persistent features:

    Snapshot (RDB files) and additional Files (AOF file), better performance;

    save "" Disable snapshot;

    appendonly no close RDB.

  • The elimination strategies turn off the memory, the maximum memory space

    maxmemory 0 #maxmemory is 0 when we do not have restrictions on the use of the Redis memory.

5, Elasticsearch node Optimization

Server hardware configuration, OS parameters:

1) /etc/sysctl.conf configuration

/etc/sysctl.conf vim ① vm.swappiness = 1 #ES recommend this parameter is set to 1, dramatically reducing the size of the swap partition, forced to maximize the use of memory, attention, not to set this to 0, which is likely to cause OOM ② net.core.somaxconn = 65535 length of each port's largest queue monitor # define ③ vm.max_map_count = 262144 number # limit a process can have VMA (virtual memory area). Virtual memory area is a contiguous virtual address space area. When the number exceeds this value VMA, the OOM ④ fs.file 518 144-max = maximum number of Linux kernel distribution provided # filehandle











[Root @ elasticsearch] # sysctl -p take effect about.

2) limits.conf Configuration

vim /etc/security/limits.conf
elasticsearch soft nofile 65535
elasticsearch hard nofile 65535
elasticsearch soft memlock unlimited
elasticsearch hard memlock unlimited

3) In order to make permanent the above parameters, but also set two places:

vim /etc/pam.d/common-session-noninteractive

vim /etc/pam.d/common-session

Add the following attributes:

session required pam_limits.so

You may need to restart to take effect.

Elasticsearch the JVM configuration file

-Xms2g

-Xmx2g

  • The minimum heap size (the Xms) and the maximum heap size (Xmx) set equal to each other.

  • Elasticsearch more stacks available, the more can be used in cache memory. Note, however, may make you pile too much time garbage collection pauses.

  • Xmx set to not more than 50% of physical RAM, to ensure sufficient physical memory left kernel file system cache.

  • Do not set for the JVM Xmx above the critical value of a pointer to be compressed; exact cutoffs are different, but close to 32 GB. Do not exceed 32G, if the space is large, run a few instances, do not make an example of too much memory.

Elasticsearch profile optimization parameters:

elasticsearch.yml vim bootstrap.memory_lock: to true # locked memory, without the use of swap # cache, threads, and other optimization follows bootstrap.mlockall: to true transport.tcp.compress: to true indices.fielddata.cache.size: 40% indices. cache.filter.size: 30% indices.cache.filter.terms.size: 1024MB ThreadPool: Search: of the type: cached size: 100 the queue_size: 2000














Set Environment Variables

vim /etc/profile.d/elasticsearch.sh export ES_HE AP _SIZE = 2g #Heap Size of physical memory is no more than half and less than 32G.

Optimization of the cluster (I did not use cluster):

  • ES is a distributed memory, automatically discovered and added to the cluster set when the same cluster.name;

  • The cluster will automatically elect a master, when the master is down for re-election;

  • To prevent the "split brain", the number of clusters is preferably an odd number;

  • To effectively manage nodes, may be closed broadcast discovery zen.ping.multicast.enabled:. False, and disposed unicast node group discovery.zen.ping.unicast.hosts: [ "ip1", "ip2", "ip3"].

6, the performance inspection

Check the input and output performance:

Logstash and services running speed coincides connected, and it can be input and output speeds as fast.

Check the system parameters:

1)CPU

  • Note whether the CPU is overloaded. You can use top-H view process parameters in Linux / Unix systems as well as totals.

  • If the CPU usage is too high, jump directly to check the JVM heap section and check settings Logstash worker.

2)Memory

  • Note Logstash is running in the Java virtual machine, so it will only use your maximum memory allocated to it.

  • Check the other applications that use a lot of memory situation, which will result in Logstash hard disk swap, when the physical memory beyond the scope of this case will take up memory in the application.

. 3) I / O monitor disk I / O check disk saturation

  • Use Logstash plugin (for example, using the file output) disk saturation may occur.

  • When a large number of errors, Logstash generate a large error log saturating the disk will occur.

  • In Linux, you can use iostat, dstat or other commands to monitor disk I / O.

4) monitoring network I / O

  • When a plurality of network operation input, output, cause network saturation.

  • Linux can be used in iftop dstat or monitor network conditions.

Check the JVM heap:

  • heap set too small can cause high CPU usage, because the JVM garbage collection mechanism caused.

  • Setting a quick check of the heap method is provided to improve detection performance and twice the size. Do not exceed the physical memory size of the heap is set, at least 1G of memory reserved for the operating system and other processes.

  • You can use the command line or similar jmap VisualVM more accurate calculation JVM heap.

Guess you like

Origin www.cnblogs.com/jiaxiaozia/p/12199086.html