ELK series-use flume log collection

ELK series-use flume log collection

Part of our log collection is to use flume to collect logs on various application servers, and then to Tencent Cloud's message queue ckafka, and use logstash to consume ckafka's log messages and enter them into elasticserch.

flume tool introduction
  1. Flume is a distributed, reliable, and highly available mass log collection, aggregation and transmission system. It supports customizing various data senders in the log system to collect data; at the same time, Flume provides the ability to simply process data and write to various data recipients (such as text, HDFS, Hbase, kafka, etc.).
  2. The data flow of flume is run through by events. Event is the basic data unit of Flume. It carries log data (in the form of a byte array) and carries header information. These Events are generated by Source external to the Agent. When Source captures the event, it will perform specific formatting, and then Source will record the event. Push into (single or multiple) Channel. You can think of Channel as a buffer, which will hold the event until the sink has processed the event. Sink is responsible for persisting logs or pushing events to another Source.
  3. The reliability of flume, when a node fails, the log can be transmitted to other nodes without loss. Flume provides three levels of reliability guarantees, from strong to weak, respectively: end-to-end (the agent receives the data and writes the event to the disk first, and then deletes it after the data transmission is successful; if the data transmission fails , You can re-send.), Store on failure (this is also the strategy adopted by scribe, when the data receiver crashes, write the data to the local, and continue to send after recovery), Besteffort (after the data is sent to the receiver, it will not Undergo verification).
flume installation and use
  • Server upload flume software
    apache-flume-1.9.0-bin.tar.gz
  • 解压flume
    tar -zxvf apache-flume-1.9.0-bin.tar.gz -C /usr/local/
  • Modify user permissions
    chown -R yunwei:yunwei ./apache-flume-1.9.0-bin
  • Create metadata directory file
    touch /home/yunweizhrt/3wcm_fmm_visit.json
  • Modify the memory configuration
    Go to the bin directory of flume
    vi bin/flume-ng
    JAVA_OPTS="-Xmx120m"
  • Modify the configuration file. It is
    required to collect the logs under the directory /home/yunweizhrt/tomcat-logs/3w-cn/dwz-web-jump/fmm_visit to Tencent Cloud ckafka. The
    configuration script is as follows:
[yunweizhrt@VM_40_2_centos conf]$ vim  3wcm_fmm_visit

agent.sources = s1
agent.channels = c1
agent.sinks = r1

agent.sources.s1.channels = c1
agent.sinks.r1.channel = c1

agent.sources.s1.type = TAILDIR
agent.sources.s1.positionFile = /home/yunweizhrt/3wcm_fmm_visit.json
agent.sources.s1.filegroups = f1
agent.sources.s1.filegroups.f1=/home/yunweizhrt/tomcat-logs/3w-cn/dwz-web-jump/.*log
agent.sources.s1.fileHeader = true

agent.channels.c1.type = memory
agent.channels.c1.capacity = 1000
agent.channels.c1.transactionCapacity = 100

agent.sinks.r1.type = org.apache.flume.sink.kafka.KafkaSink
agent.sinks.r1.brokerList = 10.38.40.7:9092
agent.sinks.r1.topic = flume_fmm_visit_3wcn
clog.sinks.sink_log1.flumeBatchSize = 2000
clog.sinks.sink_log1.kafka.producer.acks = 1
#下面是自己开发的插件可以去掉
agent.sources.s1.interceptors = i1
agent.sources.s1.interceptors.i1.type = com.zhrt.flume.interceptor.IpExtractInterceptor$CounterInterceptorBuilder
agent.sources.s1.interceptors.i1.regex = stat_.+
agent.sources.s1.interceptors.i1.value = hourly
agent.sources.s1.interceptors.i1.default = dail
[yunweizhrt@VM_40_2_centos conf]$ 
  • Start the flume process
    nohup ./bin/flume-ng agent -n agent -c conf -f conf/3wcm_fmm_visit &

  • Verify whether the startup is successful

    Query whether this process exists

  • Verify that the data is collected

Use logstash to consume Tencent Cloud ckafka
  • The configuration information is as follows:

    The code implements the addition of a field type, which is distinguished according to the type field

[root@VM_40_24_centos conf.d]# pwd
/usr/local/logstash-7.0.1/config/conf.d
[root@VM_40_24_centos conf.d]# more   logstash-jump-3wcn-fmm.conf 
input{
    kafka{
           bootstrap_servers => "10.18.40.7:9092"
           group_id => "flume_fmm_visit_3wcn"
           topics => "flume_fmm_visit_3wcn"
           consumer_threads => 1
           decorate_events => true
           auto_offset_reset => "latest"
           type => "java_3wcn_fmm_visit"
    }
}

filter {
    if [type] == "java_3wcn_fmm_visit" {
      mutate { 
        add_field => { "types" => "%{type}"}
      }
        json {
                source => "message"
        }
        date {
                match => ["visittime", "yyyy-MM-dd HH:mm:ss"]
        }
        ruby {
                code => "event.set('timestamp', event.get('@timestamp').time.localtime + 8*60*60)"
        }
        ruby {   
                code => "event.set('@timestamp',event.get('timestamp'))"
        }

        mutate {
                 remove_field => "message"
                 remove_field => "content"
                 remove_field => "kafka"
                 remove_field => "tags"
                 remove_field => ["timestamp"]
        }
  }
}
output {
   if [types] == "java_3wcn_fmm_visit" {     
      elasticsearch {
            hosts => ["10.10.10.16:9200"]
            index => "logstash_jump_3wcn_fmm_%{+YYYY_MM_dd}"
                      }
    }
}
[root@VM_40_24_centos conf.d]#
  • Start logstash
    nohup ./bin/logstash -f ./config/conf.d/ &

  • Verify collection

Guess you like

Origin blog.csdn.net/qq_31555951/article/details/107041834