ExecSource will readLine()
read each line in the log and put it as the body of each flume event, which is perfectly fine for most log records where each line can end:
1 2 |
2016 -03-18 17 :53 :48 ,374 INFO namenode .FSNamesystem ( FSNamesystem .java : listCorruptFileBlocks(7217)) - there are no corrupt file blocks . 2016 -03-18 17 :53 :48 ,278 INFO namenode .FSNamesystem ( FSNamesystem .java : listCorruptFileBlocks(7217)) - there are no corrupt file blocks . |
However, for stacktrace
some ERROR
log records, if the content of a line is regarded as a flume event, there will be a big problem. Intuitively, it is necessary to treat several lines as a flume event. For example, the following log records should be used as a flume event. One flume event instead of 27 (27 lines total):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
2016-03-18 17:53:40,278 ERROR [HiveServer2-Handler-Pool: Thread-26]: Error occurred during processing of message. java.lang.RuntimeException: org.apache.thrift.transport.TTransportException: java.net.SocketException: Connection reset at org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:219) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:268) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.thrift.transport.TTransportException: java.net.SocketException: Connection reset at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129) at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86) at org.apache.thrift.transport.TSaslTransport.receiveSaslMessage(TSaslTransport.java:178) at org.apache.thrift.transport.TSaslServerTransport.handleSaslStartMessage(TSaslServerTransport.java:125) at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:271) at org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:41) at org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:216) ... 4 more Caused by: java.net.SocketException: Connection reset at java.net.SocketInputStream.read(SocketInputStream.java:196) at java.net.SocketInputStream.read(SocketInputStream.java:122) at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) at java.io.BufferedInputStream.read1(BufferedInputStream.java:275) at java.io.BufferedInputStream.read(BufferedInputStream.java:334) at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127) ... 10 more |
The way I implement it here is: identify the beginning of each line, if a certain condition is met, it is regarded as a log, otherwise, it is regarded as a part of the previous log.
for example:
For the example above (that is, a log that conforms to the standard log4j), if the following regular expression is satisfied at the beginning of each line:
1 |
\s?\d\d\d\d-\d\d-\d\d\s\d\d:\d\d:\d\d,\d\d\d |
就当作一条新的日志,如果不满足,就说明该行内容是上一条日志(已规定格式开头的那条)的一部分。
当然,我增加了可以自定义配置以哪种方式开头视为一条日志的regex配置,可以对不通的source进行不通的配置,已满足要求。
有了这样的约束,就可以写出将某些多行看作一个flume event的ExecSource,我把它开源到了github上,如有兴趣,欢迎前去试用,如有任何建议,欢迎提出与指正:MultiLineExecSource
1 |
github.com/qwurey/flume-source-multiline |
该版本基于flume-ng-core 1.6.0
转自:http://blog.csdn.net/asia_kobe/article/details/51003173