flume ng arc and configuration

Please ref flume user guide first

http://flume.apache.org/FlumeUserGuide.html

And the Cloudera flume blogs

http://blog.cloudera.com/blog/category/flume/

 

How to define JAVA_HOME, java options and add our customized lib into flume-ng.

All these information will be defined in FLUME_CONFI_DIR/flume-env.sh

Example like below.

JAVA_HOME=/opt/java 

JAVA_OPTS="-Xms200m -Xmx200m -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.port=3669 -Dflume.called.from.service" 

FLUME_CLASSPATH=/opt/sponge/flume/lib/*

 

How start flume-ng as agent

Please note we should name the flume collector name to hostname_agent and this name will be used in the flume-conf-agent.properties

$/usr/lib/flume/bin/flume-ng agent --conf /opt/sponge/flume/config/   --conf-file /opt/sponge/flume/conf/flume-conf-agent.properties  --name hostname_agent &

 

How to start flume-en as collector

Please note we should name the flume collector name to hostname_collector and this name will be used in the flume-conf-collector.properties

     $/usr/lib/flume/bin/flume-ng agent --conf /opt/sponge/flume/config/   --conf-file /opt/sponge/flume/conf/flume-conf-collector.properties  --name hostname_collector &

 

How to define the flume agent and flume collector property file.

I’ve already committed 2 different property files to https://svn.nam.nsroot.net:9050/svn/153299/elf/sponge-branches/2013-03-14-FlumeNG/sponge/myflumeng/config

Please ref flume-conf-agent.properties and flume-conf-collector.properties.

The basic name convention are

1)each agent name will be set as hostname_agent

2)each collector name will be set as hostname_collector

3)the source names will be source1, source2,source3…..

4)the sink name will be avroSink1, avroSink2, avroSink3….

5)each sink’s interceptor will be set as interceptor1, interceptor2, interceptor3 ….

6)all agent sinks will be AVRO sink.

7)the default collector source is AVRO source

8)agent sinks are load balanced as round robin

9)file channel is default for both agent and collector

 

flume-conf-agent.properties

hostname_agent.sources = source1, source2

hostname_agent.channels = fileChannel

hostname_agent.sinks = avroSink1, avroSink2

 

# For each one of the sources, the type is defined

hostname_agent.sources.source1.type = exec

hostname_agent.sources.source1.command = tail -F /var/log/audit/audit.log

hostname_agent.sources.source1.channels = fileChannel

hostname_agent.sources.source1.batchSize=10

 

hostname_agent.sources.source2.type = exec

hostname_agent.sources.source2.command = tail -F /var/log/flume/flume.log

hostname_agent.sources.source2.channels = fileChannel

hostname_agent.sources.source2.batchSize=10

 

# For each one of the sources, the log interceptor is defined

hostname_agent.sources.source1.interceptors = logIntercept1

hostname_agent.sources.source1.interceptors.logIntercept1.type = com.citi.sponge.flume.sink.LogInterceptor$Builder

hostname_agent.sources.source1.interceptors.logIntercept1.preserveExisting = false

hostname_agent.sources.source1.interceptors.logIntercept1.hostName = hostname

hostname_agent.sources.source1.interceptors.logIntercept1.env = PROD

hostname_agent.sources.source1.interceptors.logIntercept1.logType = AUDIT_LOG

hostname_agent.sources.source1.interceptors.logIntercept1.appId = 111111

hostname_agent.sources.source1.interceptors.logIntercept1.logFilePath = /var/log/audit

hostname_agent.sources.source1.interceptors.logIntercept1.logFileName = audit.log

  

hostname_agent.sources.source2.interceptors = logIntercept2

hostname_agent.sources.source2.interceptors.logIntercept2.type = com.citi.sponge.flume.sink.LogInterceptor$Builder

hostname_agent.sources.source2.interceptors.logIntercept2.preserveExisting = false

hostname_agent.sources.source2.interceptors.logIntercept2.hostName = hostname

hostname_agent.sources.source2.interceptors.logIntercept2.env = PROD

hostname_agent.sources.source2.interceptors.logIntercept2.logType = FLUME

hostname_agent.sources.source2.interceptors.logIntercept2.appId = 111111

hostname_agent.sources.source2.interceptors.logIntercept2.logFilePath = /var/log/flume

hostname_agent.sources.source2.interceptors.logIntercept2.logFileName = flume.log

 

 

#for each of the sink, type is defined

hostname_agent.sinks.avroSink1.type = avro

hostname_agent.sinks.avroSink1.hostname=collector1

hostname_agent.sinks.avroSink1.port=1442

hostname_agent.sinks.avroSink1.batchSize=10 

hostname_agent.sinks.avroSink1.channel = fileChannel 

 

hostname_agent.sinks.avroSink2.type = avro

hostname_agent.sinks.avroSink2.hostname=collector2

hostname_agent.sinks.avroSink2.port=1442

hostname_agent.sinks.avroSink2.batchSize=10 

hostname_agent.sinks.avroSink2.channel = fileChannel 

 

 

#Specify the load balance configurations for sinks

agent.sinkgroups = sinkGroup

agent.sinkgroups.sinkGroup.sinks = avroSink1 avroSink2

agent.sinkgroups.sinkGroup.processor.type = load_balance

agent.sinkgroups.sinkGroup.processor.backoff = true

agent.sinkgroups.sinkGroup.processor.selector = round_robin

agent.sinkgroups.sinkGroup.processor.selector.maxBackoffMillis=30000

 

 

# Each channel's type is defined.

hostname_agent.channels.fileChannel.type = file

hostname_agent.channels.fileChannel.checkpointDir = /opt/sponge/file-channel/checkpoint

hostname_agent.channels.fileChannel.dataDirs = /opt/sponge/file-channel/dataDirs

hostname_agent.channels.fileChannel.transactionCapacity = 1000

hostname_agent.channels.fileChannel.checkpointInterval = 30000

hostname_agent.channels.fileChannel.maxFileSize = 2146435071

hostname_agent.channels.fileChannel.minimumRequiredSpace = 524288000

hostname_agent.channels.fileChannel.keep-alive = 5

hostname_agent.channels.fileChannel.write-timeout = 5

hostname_agent.channels.fileChannel.checkpoint-timeout = 600

 

flume-collector.properties

hostname_collector.sources = source

hostname_collector.channels = fileChannel

hostname_collector.sinks = hbaseSink

 

 

# For each one of the sources, the type is defined

hostname_collector.sources.avroSource.channels = fileChannel

hostname_collector.sources.avroSource.type = avro

hostname_collector.sources.avroSource.bind = hostname

hostname_collector.sources.avroSource.port = 1442 

 

hostname_collector.sinks.hbaseSink.type=org.apache.flume.sink.hbase.HBaseSink

hostname_collector.sinks.hbaseSink.table=spong_flumeng_log2

hostname_collector.sinks.hbaseSink.columnFamily=content

hostname_collector.sinks.hbaseSink.serializer=com.citi.sponge.flume.sink.LogHbaseEventSerializer

hostname_collector.sinks.hbaseSink.timeout=120

hostname_collector.sinks.hbaseSink.column=log

hostname_collector.sinks.hbaseSink.batchSize=2

hostname_collector.sinks.hbaseSink.channel=fileChannel

 

 

# Each channel's type is defined.

hostname_collector.channels.fileChannel.type = file

hostname_collector.channels.fileChannel.checkpointDir = /opt/sponge/file-channel/checkpoint

hostname_collector.channels.fileChannel.dataDirs = /opt/sponge/file-channel/dataDirs

hostname_collector.channels.fileChannel.transactionCapacity = 1000

hostname_collector.channels.fileChannel.checkpointInterval = 30000

hostname_collector.channels.fileChannel.maxFileSize = 2146435071

hostname_collector.channels.fileChannel.minimumRequiredSpace = 524288000

hostname_collector.channels.fileChannel.keep-alive = 5

hostname_collector.channels.fileChannel.write-timeout = 5

hostname_collector.channels.fileChannel.checkpoint-timeout = 600

猜你喜欢

转载自ilnba.iteye.com/blog/1834917
今日推荐