基于Hadoo的日志收集框架---Chukwa的处理流程

1. 模拟增量日志环境

/home/matrix/Program/project/log/testlog

- 10.0.0.10 [17/Oct/2011:23:20:40 +0800] GET /img/chukwa.jpg HTTP/1.0 "404" "16" "Mozilla/5.0 (MSIE 9.0; Windows NT 6.1;)"
- 10.0.0.11 [17/Oct/2011:23:20:40 +0800] GET /img/chukwa.jpg HTTP/1.0 "404" "16" "Mozilla/5.0 (MSIE 9.0; Windows NT 6.1;)"
- 10.0.0.12 [17/Oct/2011:23:20:40 +0800] GET /img/chukwa.jpg HTTP/1.0 "404" "16" "Mozilla/5.0 (MSIE 9.0; Windows NT 6.1;)"
- 10.0.0.13 [17/Oct/2011:23:20:40 +0800] GET /img/chukwa.jpg HTTP/1.0 "404" "16" "Mozilla/5.0 (MSIE 9.0; Windows NT 6.1;)"
- 10.0.0.14 [17/Oct/2011:23:20:40 +0800] GET /img/chukwa.jpg HTTP/1.0 "404" "16" "Mozilla/5.0 (MSIE 9.0; Windows NT 6.1;)"
- 10.0.0.15 [17/Oct/2011:23:20:40 +0800] GET /img/chukwa.jpg HTTP/1.0 "404" "16" "Mozilla/5.0 (MSIE 9.0; Windows NT 6.1;)"
- 10.0.0.16 [17/Oct/2011:23:20:40 +0800] GET /img/chukwa.jpg HTTP/1.0 "404" "16" "Mozilla/5.0 (MSIE 9.0; Windows NT 6.1;)"
- 10.0.0.17 [17/Oct/2011:23:20:40 +0800] GET /img/chukwa.jpg HTTP/1.0 "404" "16" "Mozilla/5.0 (MSIE 9.0; Windows NT 6.1;)"
- 10.0.0.18 [17/Oct/2011:23:20:40 +0800] GET /img/chukwa.jpg HTTP/1.0 "404" "16" "Mozilla/5.0 (MSIE 9.0; Windows NT 6.1;)"
- 10.0.0.19 [17/Oct/2011:23:20:40 +0800] GET /img/chukwa.jpg HTTP/1.0 "404" "16" "Mozilla/5.0 (MSIE 9.0; Windows NT 6.1;)"

/home/matrix/Program/project/log/logtest

- 192.168.0.10 [17/Oct/2011:23:20:40 +0800] GET /img/chukwa.jpg HTTP/1.0 "404" "16" "Mozilla/5.0 (MSIE 9.0; Windows NT 6.1;)"
- 192.168.0.11 [17/Oct/2011:23:20:40 +0800] GET /img/chukwa.jpg HTTP/1.0 "404" "16" "Mozilla/5.0 (MSIE 9.0; Windows NT 6.1;)"
- 192.168.0.12 [17/Oct/2011:23:20:40 +0800] GET /img/chukwa.jpg HTTP/1.0 "404" "16" "Mozilla/5.0 (MSIE 9.0; Windows NT 6.1;)"
- 192.168.0.13 [17/Oct/2011:23:20:40 +0800] GET /img/chukwa.jpg HTTP/1.0 "404" "16" "Mozilla/5.0 (MSIE 9.0; Windows NT 6.1;)"
- 192.168.0.14 [17/Oct/2011:23:20:40 +0800] GET /img/chukwa.jpg HTTP/1.0 "404" "16" "Mozilla/5.0 (MSIE 9.0; Windows NT 6.1;)"
- 192.168.0.15 [17/Oct/2011:23:20:40 +0800] GET /img/chukwa.jpg HTTP/1.0 "404" "16" "Mozilla/5.0 (MSIE 9.0; Windows NT 6.1;)"
- 192.168.0.16 [17/Oct/2011:23:20:40 +0800] GET /img/chukwa.jpg HTTP/1.0 "404" "16" "Mozilla/5.0 (MSIE 9.0; Windows NT 6.1;)"
- 192.168.0.17 [17/Oct/2011:23:20:40 +0800] GET /img/chukwa.jpg HTTP/1.0 "404" "16" "Mozilla/5.0 (MSIE 9.0; Windows NT 6.1;)"
- 192.168.0.18 [17/Oct/2011:23:20:40 +0800] GET /img/chukwa.jpg HTTP/1.0 "404" "16" "Mozilla/5.0 (MSIE 9.0; Windows NT 6.1;)"
- 192.168.0.19 [17/Oct/2011:23:20:40 +0800] GET /img/chukwa.jpg HTTP/1.0 "404" "16" "Mozilla/5.0 (MSIE 9.0; Windows NT 6.1;)"

/home/matrix/Program/project/log/write_log.sh

#!/bin/bash
cat /home/matrix/Program/project/log/testlog >> /home/matrix/Program/project/log/testlog1
cat /home/matrix/Program/project/log/logtest >> /home/matrix/Program/project/log/testlog2

/etc/crontab

*/1 * * * * matrix /home/matrix/Program/project/log/write_log.sh

$CHUKWA_HOME/conf/initial_adaptors

add filetailer.CharFileTailingAdaptorUTF8 TestLog1 0 /home/matrix/Program/project/log/testlog1 0
add filetailer.CharFileTailingAdaptorUTF8 TestLog2 0 /home/matrix/Program/project/log/testlog2 0

2. chukwa的目录结构

/chukwa/
   archivesProcessing/
   dataSinkArchives/
   demuxProcessing/
   finalArchives/
   logs/
   postProcess/
   repos/
   rolling/
   temp/

3. chukwa的处理过程

(1) adaptors使用tail方式监测日志增量
(2) agent发送数据到collectors
(3) collectors将各agent收集的数据在/chukwa/logs/目录下写成*.chukwa文件


(4) 当*.chukwa文件大小达到阀值或达到一定时间间隔时将其改名为*.done文件
(5) demux进程将/chukwa/logs/*.done文件转移到/chukwa/demuxProcessing/mrInput/目录下进行处理
(6) postProcess进程将demux进程处理完成的*.evt文件转储到/chukwa/repos/目录下

(7) 可以根据postProcess进程按照日志类型在/chukwa/rolling/目录下生成的文件进行按天或按小时的数据合并

猜你喜欢

转载自savagegarden.iteye.com/blog/1426954