flume parameters:
# example.conf: single node configuration Flume # named components on this proxy a1.sources = R1 a1.sinks = K1 a1.channels = C1 # Description / configuration source a1.sources.r1.type = netcat a1.sources.r1 .bind = localhost a1.sources.r1.port = 44444 # receivers described a1.sinks.k1.type = Logger # use of a buffer memory in the event of channel a1.channels.c1.type = memory a1.channels.c1.capacity 1000 = a1.channels.c1.transactionCapacity = 100 # will be bound to the receiver and the source channel a1.sources.r1.channels = C1 a1.sinks.k1.channel = c1
This configuration defines a named individual agent a1. a1 one of the listeners of the source data port 44444, a channel event data in the buffer memory, and a recording event data to the receiver of the console.
According to scource, channel, sink division
. 1, the Sources
The Flume has commonly used Source NetCat, Avro, Exec, Spooling Directorymay also need to custom business scenario Source, detailed below.
. 1) the Source NetCat
NetCat UDP and TCP can use the Source two protocols manner using substantially the same methods, by listening to the specified IP and port to transmit data, it will listen to each row of data into an Event written into the Channel . (@ Parameters must be marked, under similar)
The Description the Name the Default Property channels @ - of the type @ - type is specified as: netcat the bind @ - binding machine name or IP address of the port @ - port number max -line-length 512 maximum number of bytes in a row ACK -every-Event to true on success Event returns received the OK selector.type Replicating selector Multiplexing type or Replicating selector. * selector parameters interceptors - interceptor list, a plurality of spaces separated by interceptors. * interceptor parameters
2)Avro Source
Agent on different hosts through a network that can be used to transmit data Source, Avro client data is generally accepted or are on one and the Agent Avro Sink in pairs.
The Description the Name the Default Property channels @ - of the type @ - type is specified as: Avro the bind @ - listening host name or IP address of the port @ - port number of threads - maximum number of threads that can be used transmission selector.type Selector. * Interceptors - interceptor list interceptors. * compression -type none may be set to "none" or "deflate". AvroSource compression type matching needs and
3)Exec Source
-
Exec source data transmission results by performing a given Unix commands, such as, cat, tail -F and so on, real-time high, but once the Agent process problems, may result in loss of data.
Property the Name the Default the Description channels @ - of the type @ - type is specified as: Exec the Command @ - commands need to be executed shell - shell script file to run commands restartThrottle 10000 attempt to restart timeout restart false If the command fails, whether to restart logStdErr false if log errors batchSize 20 is the batch number the maximum log write channel batchTimeout 3000 batch write data maximum waiting time (ms) selector.type replicating select the type or Multiplexing Replicating Selector. * selector other parameters interceptors - the list of interceptors, more spaces interceptors. *
4)Spooling Directory Source
The new file folder content conversion by monitoring a file to transfer data Event, characterized by no loss of data, using the Spooling Directory Source to note two points,
1) can not make any changes to the new file in the folder being monitored file,
2) added to the file name of the monitored folder must be unique. As is the monitoring of the entire file of the new, real-time Spooling Directory Source is relatively low, but you can split the file size of high reach near real-time.
The Description the Name the Default Property channels @ - of the type @ - Specifies the type:. Spooldir spooldir @ - monitored folder directory fileSuffix .COMPLETED complete the data transfer file suffixes mark deletePolicy never delete files has completed the data transfer time: Never load immediate or FileHeader false whether to add file in the header in the full path information fileHeaderKey file key is if the full path information file to add header in the name basenameHeader false whether to add file in the header of the base name information basenameHeaderKey basename If the base name information file to add header in key The name includePattern ^. *file $ using regular file needs to match the new data to be transmitted ignorePattern ^ $ regular use to ignore the new file trackerDir .flumespool store metadata information directory consumeOrder oldest files Consumer order:. oldest, youngest and Random maxBackoff 4000 if channel insufficient capacity, try to write timeout, if still can not be written, ChannelException thrown batchSize 100 batch size inputCharset UTF -8 input symbols tabular decodeErrorPolicy FAIL encountered after the treatment can not be decoded characters: FAIL, REPLACE , IGNORE selector.type replicating selector type: Replicating or Multiplexing . selector * selector Other parameters interceptors - interceptors list, separated by a space interceptors. *
5)Taildir Source
Real-time monitoring can specify one or more files to add content, due to the way the offset of the data stored in a designated json file, even if lost or hang kill not have data Agent, It should be noted that the Source can not be used on Windows.
The Description the Default the Name Property channels @ - type @ - specifies the type:. TAILDIR FileGroups @ - group in the name of the file, a plurality of spaces separated FileGroups. <FileGroupName> @ - is an absolute file path monitoring positionFile ~ / .flume / taildir_position.json storage data offset path headers. <FileGroupName> <headerKey>. - Header key name byteOffsetHeader to false if the byte offset is added to the key is 'byteoffset' value skipToEnd to false if the shift amount when the jump is not written to the file to the end of the file the idleTimeout 120000 Close file timeout no new content (ms) writePosInterval 3000 writes a file each time interval lastposition positionfile batchSize 100 Batch rows FileHeader to false whether to add header store file absolute path when fileHeaderKey file fileHeader enabled, using key
2, Channels
Channel official website provides there are many types to choose from, and File Channel Memory Channel introduced here.
1) Memory Channel
Memory Channel is the use of memory to store the Event, memory usage means that the data transfer rate soon, but when Agent hang up, the data stored in the Channel will be lost.
The Description the Name the Default Property of the type @ - type is specified as: Memory Capacity 100 maximum storage capacity in the channel in transactionCapacity 100 to or from a source to a sink, each transaction in the largest number of events in the Keep -alive 3 to add or delete a second timeout events byteCapacityBufferPercentage 20 custom cache percentage byteCapacity see description Channel allowed to store the maximum total number of bytes
2)File Channel
File Channel using disk to store Event, a slower rate relative to the Memory Channel, but the data will not be lost.
The Description the Name the Default Property of the type @ - Specifies the type:. File checkpointDir ~ / .flume / File-Channel / checkpoint checkpoint directory useDualCheckpoints false backup checkpoint, True, backupCheckpointDir must be set to backupCheckpointDir - backup checkpoint directory dataDirs ~ / .flume / File-Channel / directory data storage location data provided transactionCapacity 10000 the Event maximum storage CheckpointInterval 30000 the checkpoint interval the maxFileSize 2146435071 Maximum setting number of bytes single log minimumRequiredSpace 524.288 million minimum request free space (in bytes) Capacity 1000000 Channel maximum capacity Keep -alive. 3 latency value of a storing operation (s) use -log-Replay-V1 to false Expert: use the old logical reply use -fast-Replay to false Expert: reply does not need to queue checkpointOnClose to true
3、Sinks
Flume has commonly Sinks Log Sink, HDFS Sink, Avro Sink, Kafka Sink, of course, can also customize Sink.
1)Logger Sink
Logger Sink INFO level logging to log into the log, this approach is generally used for testing.
The Description the Default the Name Property
Channel @ -
type @ - Specifies the type: Logger
maxBytesToLog 16 can be recorded the maximum number of bytes Event Body
2)HDFS Sink
-
Sink data to HDFS, currently supports text sequence files and two file formats, support for compression, and data can be partitioned, divided barrel storage.
The Description the Default the Name Channel @ - of the type @ - specify type: HDFS hdfs.path @ - HDFS path, EG HDFS: // the NameNode / Flume / the Webdata / prefix hdfs.filePrefix FlumeData preserving data files hdfs.fileSuffix - store data files the extension hdfs.inUsePrefix - temporary file name prefix written hdfs.inUseSuffix .tmp temporary file extensions written hdfs.rollInterval 30 interval how long the temporary file to the ultimate goal of rolling papers, unit: seconds, if set to 0 , it said they did not come to scroll through files based on time hdfs.rollSize 1024 When the temporary file reaches the number of: (Unit in bytes), scroll to the destination file, if set to 0, it means not to scroll through the file according to the size of the temporary file hdfs.rollCount 10 when data reaches the number of events when the temporary files into scrolling goal file, if set to 0, it indicates that the file is not scrolled according to the events data hdfs.idleTimeout 0 when the temporary file currently opened parameter specified in the time (seconds), no data is written, the temporary file is closed object file and rename hdfs.batchSize 100 for each lot number of events to refresh on the HDFS hdfs.codeC - a compressed file, comprising: the gzip, bzip2, LZO, lzop, Snappy hdfs.fileType SequenceFile file format, comprising: SequenceFile, DataStream, CompressedStre, When using DataStream when the files are not compressed, no need to set hdfs.codeC; when CompressedStream time, you must set a correct hdfs.codeC value; hdfs.maxOpenFiles 5000 the maximum number of HDFS file allowed to open, when the number of open files this value is reached, the oldest open the file will be closed hdfs.minBlockReplicas - HDFS number of copies, the minimum number of copies HDFS write file blocks. The parameter influences the rolling profile, the general parameters configured to 1, it can follow the correct configuration file scroll written format sequence files hdfs.writeFormat Writable. Comprising: Text, Writable (default) hdfs.callTimeout 10000 performs operations HDFS timeout (Unit: ms) hdfs.threadsPoolSize 10 hdfs sink initiated action HDFS number of threads hdfs.rollTimerPoolSize . 1 hdfs sink started rolling file according to the number of times the thread hdfs.kerberosPrincipal - HDFS secure authentication kerberos configuration hdfs.kerberosKeytab - HDFS secure authentication kerberos configuration hdfs.proxyUser proxy user hdfs.round false "give up" on whether to enable time hdfs.roundValue 1 a "give up" on the time value of a "give up" on hdfs.roundUnit second time units, include: SECOND,, minute, hour hdfs.timeZone Local time time zone. hdfs.useLocalTimeStamp false whether to use local time hdfs.closeTries 0 Number The HDFS sink attempts to close the file; if set to 1, when a failure to close the file, hdfs sink will not attempt to close the file again, this does not close the file will always remain there, and is turned on; set 0, when closed after a failure, the hdfs sink will continue to attempt to close, until successful hdfs.retryInterval 180 [ hdfs sink attempt to close the file of the time interval, if set to 0, indicating no attempt, in the equivalent arranged hdfs.closeTries . 1 Serializer sequence of the TEXT type Serializer. *
3)Avro Sink
The Description the Default the Name Property Channel @ - type @ - specifies the type:. Avro hostname @ - host name or the IP Port @ - Port number BATCH -size 100 Batch Event Number Connect -timeout 20000 Timeout Interval Request -timeout 20000 request time compression - . none of the type compression type, "none" or "the deflate" compression -Level 6 compression levels, 0 indicates no compression, 19 The higher the number, the higher the compression ratio ssl to false use encryption ssl
4)Kafka Sink
-
Transfer data to Kafka, note that Flume version and version compatible Kafka
The Description the Name the Default Property of the type - specifies the type: org.apache.flume.sink.kafka.KafkaSink kafka.bootstrap.servers - Kafka used to live service address kafka.topic default -flume- Topic Kafka used to live Topic flumeBatchSize 100 batches write kafka Event number kafka. producer.acks 1 how many copies can only be determined after a successful message delivery confirmation, 0 indicates no acknowledgment 1 shows only the primary copy is confirmed, -1 means all waiting for confirmation.
Startup parameters:
command
bin / flume-ng agent -conf conf -z zkhost:2181,zkhost1:2181 -p / flume -name a1 -Dflume.root.logger = INFO,console
1, flume-ng agent runs a Flume Agent
2, -conf specify the configuration file, under -conf parameters defined in this configuration file must be options in the global directory.
3, -z Zookeeper connection string. A comma-separated list of hostname: port
4, the base path -p Zookeeper for storing the proxy configuration
5, the name of a1 -name a1 Agent
6, -Dflume.root.logger = INFO, console This parameter will output to the console log flume, to output it to the log file (default at $ FLUME_HOME / logs), instead of the console can form LOGFILE
Particular configuration can modify $ FLUME_HOME / conf / log4j.properties
-Dflume.log.file =. / WchatAgent.logs the parameters directly to the target output log file
specific:
https://blog.csdn.net/realoyou/article/details/81514128
Reference:
https://www.jianshu.com/p/4252fbcdce79