flume parameter parsing + startup parameter analysis

flume parameters:

# example.conf: single node configuration Flume 

# named components on this proxy 
a1.sources   =   R1 
a1.sinks   =   K1 
a1.channels   =   C1 

# Description / configuration source 
a1.sources.r1.type   =   netcat 
a1.sources.r1 .bind   =   localhost      
a1.sources.r1.port   = 44444 

# receivers described 
a1.sinks.k1.type   =   Logger 

# use of a buffer memory in the event of channel 
a1.channels.c1.type   =   memory 
a1.channels.c1.capacity   1000 =  
a1.channels.c1.transactionCapacity   = 100 

# will be bound to the receiver and the source channel 
a1.sources.r1.channels  =   C1 
a1.sinks.k1.channel   = c1

This configuration defines a named individual agent a1. a1 one of the listeners of the source data port 44444, a channel event data in the buffer memory, and a recording event data to the receiver of the console.

 

According to scource, channel, sink division

. 1, the Sources
  The Flume has commonly used Source NetCat, Avro, Exec, Spooling Directorymay also need to custom business scenario Source, detailed below.

  . 1) the Source NetCat
    NetCat UDP and TCP can use the Source two protocols manner using substantially the same methods, by listening to the specified IP and port to transmit data, it will listen to each row of data into an Event written into the Channel . (@ Parameters must be marked, under similar)
    

The Description the Name the Default Property 
channels @ -      
of the type @ - type is specified as: netcat 
the bind @ - binding machine name or IP address of the 
port @ - port number 
max -line-length 512          maximum number of bytes in a row 
ACK -every-Event    to true         on success Event returns received the OK 
selector.type Replicating selector Multiplexing type or Replicating 
selector. *                     selector parameters 
interceptors - interceptor list, a plurality of spaces separated by 
interceptors. * interceptor parameters

  2)Avro Source

    Agent on different hosts through a network that can be used to transmit data Source, Avro client data is generally accepted or are on one and the Agent Avro Sink in pairs.

The Description the Name the Default Property 
channels @ -    
of the type @ - type is specified as: Avro 
the bind @ - listening host name or IP address of the 
port @ - port number of 
threads - maximum number of threads that can be used transmission 
selector.type         
Selector. *        
Interceptors - interceptor list 
interceptors. *        
compression -type none may be set to "none" or "deflate". AvroSource compression type matching needs and

  3)Exec Source

  •     Exec source data transmission results by performing a given Unix commands, such as, cat, tail -F and so on, real-time high, but once the Agent process problems, may result in loss of data.

Property the Name the Default the Description 
channels @ -     
of the type @ - type is specified as: Exec 
the Command @ - commands need to be executed 
shell - shell script file to run commands 
restartThrottle            10000                   attempt to restart timeout 
restart                    false                   If the command fails, whether to restart 
logStdErr                  false                   if log errors 
batchSize                   20 is                     the batch number the maximum log write channel 
batchTimeout               3000                    batch write data maximum waiting time (ms)
selector.type replicating select the type or Multiplexing Replicating 
Selector. *                                        selector other parameters 
interceptors - the list of interceptors, more spaces 
interceptors. *  

  4)Spooling Directory Source

    The new file folder content conversion by monitoring a file to transfer data Event, characterized by no loss of data, using the Spooling Directory Source to note two points,

      1) can not make any changes to the new file in the folder being monitored file,

      2) added to the file name of the monitored folder must be unique. As is the monitoring of the entire file of the new, real-time Spooling Directory Source is relatively low, but you can split the file size of high reach near real-time.

The Description the Name the Default Property 
channels @ -    
of the type @ - Specifies the type:. Spooldir 
spooldir @ - monitored folder directory 
fileSuffix .COMPLETED complete the data transfer file suffixes mark 
deletePolicy never delete files has completed the data transfer time: Never load immediate or 
FileHeader                      false               whether to add file in the header in the full path information 
fileHeaderKey file key is if the full path information file to add header in the name 
basenameHeader                  false               whether to add file in the header of the base name information 
basenameHeaderKey basename If the base name information file to add header in key The name
includePattern                   ^. *file $ using regular file needs to match the new data to be transmitted 
ignorePattern                    ^ $ regular use to ignore the new file 
trackerDir .flumespool store metadata information directory 
consumeOrder oldest files Consumer order:. oldest, youngest and Random 
maxBackoff                      4000                if channel insufficient capacity, try to write timeout, if still can not be written, ChannelException thrown 
batchSize                       100                 batch size 
inputCharset UTF -8                input symbols tabular 
decodeErrorPolicy FAIL encountered after the treatment can not be decoded characters: FAIL, REPLACE , IGNORE 
selector.type replicating selector type: Replicating or Multiplexing 
. selector *                                         selector Other parameters
interceptors - interceptors list, separated by a space 
interceptors. *

  5)Taildir Source

    Real-time monitoring can specify one or more files to add content, due to the way the offset of the data stored in a designated json file, even if lost or hang kill not have data Agent, It should be noted that the Source can not be used on Windows.

The Description the Default the Name Property 
channels @ -   
type @ - specifies the type:. TAILDIR 
FileGroups @ - group in the name of the file, a plurality of spaces separated 
FileGroups. <FileGroupName> @ - is an absolute file path monitoring 
positionFile          ~ / .flume / taildir_position.json storage data offset path 
headers. <FileGroupName> <headerKey>.    - Header key name 
byteOffsetHeader                     to false              if the byte offset is added to the key is 'byteoffset' value 
skipToEnd                            to false               if the shift amount when the jump is not written to the file to the end of the file
the idleTimeout                         120000             Close file timeout no new content (ms) 
writePosInterval                     3000               writes a file each time interval lastposition positionfile 
batchSize                             100               Batch rows 
FileHeader                           to false              whether to add header store file absolute path 
when fileHeaderKey file fileHeader enabled, using key

 

2, Channels
  Channel official website provides there are many types to choose from, and File Channel Memory Channel introduced here.
  1) Memory Channel
    Memory Channel is the use of memory to store the Event, memory usage means that the data transfer rate soon, but when Agent hang up, the data stored in the Channel will be lost.

The Description the Name the Default Property 
of the type @ - type is specified as: Memory 
Capacity                         100                   maximum storage capacity in the channel in 
transactionCapacity              100                   to or from a source to a sink, each transaction in the largest number of events in 
the Keep -alive 3                    to add or delete a second timeout events 
byteCapacityBufferPercentage     20                    custom cache percentage 
byteCapacity see description Channel allowed to store the maximum total number of bytes

  2)File Channel

    File Channel using disk to store Event, a slower rate relative to the Memory Channel, but the data will not be lost.

The Description the Name the Default Property     
of the type @ - Specifies the type:. File 
checkpointDir    ~ / .flume / File-Channel / checkpoint checkpoint directory 
useDualCheckpoints        false                     backup checkpoint, True, backupCheckpointDir must be set to 
backupCheckpointDir - backup checkpoint directory 
dataDirs     ~ / .flume / File-Channel / directory data storage location data provided 
transactionCapacity       10000                     the Event maximum storage 
CheckpointInterval        30000                     the checkpoint interval 
the maxFileSize             2146435071                 Maximum setting number of bytes single log 
minimumRequiredSpace     524.288 million                  minimum request free space (in bytes) 
Capacity                  1000000                   Channel maximum capacity 
Keep -alive. 3                       latency value of a storing operation (s) 
use -log-Replay-V1          to false                    Expert: use the old logical reply 
use -fast-Replay            to false                    Expert: reply does not need to queue 
checkpointOnClose          to true    

3、Sinks

Flume has commonly Sinks Log Sink, HDFS Sink, Avro Sink, Kafka Sink, of course, can also customize Sink.

  1)Logger Sink

    Logger Sink INFO level logging to log into the log, this approach is generally used for testing.

The Description the Default the Name Property 
Channel @ -    
type @ - Specifies the type: Logger 
maxBytesToLog            16 can be recorded the maximum number of bytes Event Body  

  2)HDFS Sink

  •  Sink data to HDFS, currently supports text sequence files and two file formats, support for compression, and data can be partitioned, divided barrel storage.

The Description the Default the Name 
Channel @ -   
of the type @ - specify type: HDFS 
hdfs.path @ - HDFS path, EG HDFS: // the NameNode / Flume / the Webdata / 
prefix hdfs.filePrefix FlumeData preserving data files 
hdfs.fileSuffix - store data files the extension 
hdfs.inUsePrefix - temporary file name prefix written 
hdfs.inUseSuffix .tmp temporary file extensions written 
hdfs.rollInterval          30                  interval how long the temporary file to the ultimate goal of rolling papers, unit: seconds, 
                                             if set to 0 , it said they did not come to scroll through files based on time 
hdfs.rollSize             1024                When the temporary file reaches the number of: (Unit in bytes), scroll to the destination file, 
                                             if set to 0, it means not to scroll through the file according to the size of the temporary file 
hdfs.rollCount             10                  when data reaches the number of events when the temporary files into scrolling goal file, 
                                             if set to 0, it indicates that the file is not scrolled according to the events data 
hdfs.idleTimeout           0                   when the temporary file currently opened parameter specified in the time (seconds), 
                                             no data is written, the temporary file is closed object file and rename 
hdfs.batchSize            100                  for each lot number of events to refresh on the HDFS 
hdfs.codeC - a compressed file, comprising: the gzip, bzip2, LZO, lzop, Snappy 
hdfs.fileType SequenceFile file format, comprising: SequenceFile, DataStream, CompressedStre,
                                             When using DataStream when the files are not compressed, no need to set hdfs.codeC; 
                                             when CompressedStream time, you must set a correct hdfs.codeC value; 
hdfs.maxOpenFiles         5000                 the maximum number of HDFS file allowed to open, when the number of open files this value is reached, 
                                             the oldest open the file will be closed 
hdfs.minBlockReplicas - HDFS number of copies, the minimum number of copies HDFS write file blocks. 
                                             The parameter influences the rolling profile, the general parameters configured to 1, it can follow the correct configuration file scroll 
written format sequence files hdfs.writeFormat Writable. Comprising: Text, Writable (default) 
hdfs.callTimeout          10000                performs operations HDFS timeout (Unit: ms) 
hdfs.threadsPoolSize       10                 hdfs sink initiated action HDFS number of threads 
hdfs.rollTimerPoolSize     . 1                  hdfs sink started rolling file according to the number of times the thread 
hdfs.kerberosPrincipal - HDFS secure authentication kerberos configuration 
hdfs.kerberosKeytab - HDFS secure authentication kerberos configuration 
hdfs.proxyUser proxy user 
hdfs.round               false                 "give up" on whether to enable time 
hdfs.roundValue            1                   a "give up" on the time value of 
a "give up" on hdfs.roundUnit second time units, include: SECOND,, minute, hour 
hdfs.timeZone Local time time zone. 
hdfs.useLocalTimeStamp   false                 whether to use local time 
hdfs.closeTries 0        Number The HDFS sink attempts to close the file; 
                                             if set to 1, when a failure to close the file, hdfs sink will not attempt to close the file again, 
                                             this does not close the file will always remain there, and is turned on; 
                                             set 0, when closed after a failure, the hdfs sink will continue to attempt to close, until successful 
hdfs.retryInterval         180 [                 hdfs sink attempt to close the file of the time interval, 
                                             if set to 0, indicating no attempt, in the equivalent arranged hdfs.closeTries . 1 
Serializer sequence of the TEXT type 
Serializer. *

  3)Avro Sink

    The Description the Default the Name Property 
    Channel @ -    
    type @ - specifies the type:. Avro 
    hostname @ - host name or the IP 
    Port @ - Port number 
    BATCH -size 100                  Batch Event Number 
    Connect -timeout 20000                 Timeout Interval 
    Request -timeout 20000                 request time 
    compression - . none of the type compression type, "none" or "the deflate" 
    compression -Level 6 compression levels, 0 indicates no compression, 19 The higher the number, the higher the compression ratio 
    ssl                         to false                 use encryption ssl

  4)Kafka Sink

  •  Transfer data to Kafka, note that Flume version and version compatible Kafka

The Description the Name the Default Property 
of the type - specifies the type: org.apache.flume.sink.kafka.KafkaSink 
kafka.bootstrap.servers - Kafka used to live service address 
kafka.topic           default -flume- Topic Kafka used to live Topic 
flumeBatchSize               100                 batches write kafka Event number 
kafka. producer.acks           1                  how many copies can only be determined after a successful message delivery confirmation, 0 indicates no acknowledgment 
                                               1 shows only the primary copy is confirmed, -1 means all waiting for confirmation.

 

 Startup parameters:

 

command

 bin / flume-ng agent -conf conf -z zkhost:2181,zkhost1:2181 -p / flume -name a1 -Dflume.root.logger = INFO,console

1, flume-ng agent runs a Flume Agent

2, -conf specify the configuration file, under -conf parameters defined in this configuration file must be options in the global directory.

3, -z Zookeeper connection string. A comma-separated list of hostname: port

4, the base path -p Zookeeper for storing the proxy configuration

5, the name of a1 -name a1 Agent

6, -Dflume.root.logger = INFO, console This parameter will output to the console log flume, to output it to the log file (default at $ FLUME_HOME / logs), instead of the console can form LOGFILE

    Particular configuration can modify $ FLUME_HOME / conf / log4j.properties

    -Dflume.log.file =. / WchatAgent.logs the parameters directly to the target output log file

 

specific:

 

 


https://blog.csdn.net/realoyou/article/details/81514128

Reference: 

https://www.jianshu.com/p/4252fbcdce79

 

Guess you like

Origin www.cnblogs.com/51python/p/10966341.html