Flume knowledge expansion

1 Common regular expression syntax

Yuan Character Description
^ Matches the beginning of the string. If the object is set RegExp Multiline property, ^ also matches the position after "\ n" or "\ r".
$ Matches the input end of the string. If the object is set RegExp Multiline property, $ also matches the position before the "\ n" or "\ r".
* Matches any number of times in front of the sub-expression. For example, zo * matches "z", "zo" and "zoo". * Is equivalent to {0}.
+ To match a subexpression one or more times (more than one time). For example, "zo +" matches "zo" and "zoo", but can not match the "z". + Is equivalent to {1}.
[Az] character range. Matches any character within the specified range. For example, "[az]" matches "a" to any of the lowercase alphabetic characters "z" range.
Note: Only the hyphen in the character set of internal and occurs between two characters to represent a range of characters; if the beginning of the character set, you can only represent a hyphen itself.

  question:

1 How to achieve data transmission of monitoring Flume

Use third-party frameworks Ganglia real-time monitoring Flume.

2 Flume role of Source, Sink, Channel of? Source What you type?

 

1, the role of

(1) Source assembly is designed to collect data, can handle various types of log data in various formats, including avro, thrift, exec, jms, spooling directory, netcat, sequence generator, syslog, http, legacy

(2) Channel component of the collected data cache, or may be stored in a Memory File in.

(3) Sink is a component to send data to the destination component, the destination including Hdfs, Logger, avro, thrift, ipc, file, Hbase, solr, custom.

2, Source type I am using is:

(1) monitor the background log: exec

(2) monitoring background generated port log: netcat

Exec  spooldir

3 Flume的Channel Selectors

 

 

Tuning Parameter 4 Flume

1. Source

Source capacity-increase (increased number FileGroups use Tair Dir Source) you can increase the read data of the Source. For example: When a directory file generated by this excessive need to be split into multiple files directory file directory and configure multiple Source Source to ensure that there is sufficient capacity to acquire newly generated data.

batchSize Source parameter determines the number of event first batch shipped to strip Channel and appropriate transfer large this parameter can improve performance when handling Event Source to the Channel.

2. Channel 

Performance Channel is best, but if you accidentally hang up Flume process may lose data type selected memory. Select the file type when fault tolerance Channel better, but worse than memory channel performance.

dataDirs configuration directory under the disc using a plurality of different file Channel can improve performance.

Channel Capacity parameter determination can accommodate the largest number of pieces event. transactionCapacity Source parameter determines the maximum number of event every article written inside to channel the maximum number of event and each channel Sink read from the inside. transactionCapacity requires more than batchSize parameters of Source and Sink.

3. Sink 

Sink increase in the number of consumption may increase the ability Sink the event. Sink is not better enough on the line, too much Sink consumes system resources, resulting in unnecessary waste of system resources.

batchSize parameter determines the number of pieces Sink event read from a bulk Channel, this parameter can appropriately transfer large increase Sink Channel unloaded from the performance event.

 

5 Flume transaction mechanism

Flume transaction mechanism (similar to database transaction mechanism): Flume two separate transactions from Soucrce are responsible to the Channel, as well as to transfer from the Channel Sink event. For example, each line spooling directory source file to create an event, all events in the transaction once all delivered to the Channel and submitted successfully, then Soucrce to mark the file as complete. Similarly, transactions in a similar manner to the transfer process from Channel Sink, if for some reason can not make the event record, then the transaction is rolled back. And all events will remain in the Channel, pending redelivery.

6 Flume collected data will be lost it?

Not, Channel storage can be stored in File, the data transmission itself has affairs.

 

Guess you like

Origin www.cnblogs.com/tesla-turing/p/11668200.html