1. Installation planning
See big data (1) -----HADOOP installation
2. Install flume
- Install flume in the directory /home/hadoop/apache-flume-1.7.0-bin and configure environment variables
export FLUME_HOME=/home/hadoop/apache-flume-1.7.0-bin
- Configure the flume-hdfs.conf file in the conf directory, including agent, sink, channel
- Recently, when deploying the FLUME monitoring log tomcat log file to hadoop's hdfs, I found a strange problem: flume uses the exec method to monitor a single tomcat log file. If the tomcat log file is rolled by day, for example, today's log is catalina.2017 -08-05.out So after today, the log file of tomcat has changed to catalina.2017-08-06.out, but the flume monitoring configuration cannot sense the switching of tomcat logs, and the log of the previous day is still monitored.
- The original flume configuration is as follows:
agent1.sources.s2.command = tail -n +0 -F "/home/gome_guest/10.58.61.83/cashier-service_02/logs/catalina.`date +%Y-%m-%d`.out"
- The changed configuration is:
agent1.sources.s2.command = locktail_rotate.sh /home/gome_guest/10.58.61.83/cashier-service_02/logs/catalina.DATE_ROTATE.out 'date +"%Y-%m-%d"'
For locktail_rotate.sh, see https://github.com/ypenglyn/locktail/blob/master/locktail_rotate.sh