【翻译】Flume 1.8.0 User Guide(用户指南) 二

接上文:【翻译】Flume 1.8.0 User Guide(用户指南)

3.6.11 Sequence Generator Source

一个简单的序列生成器,它使用计数器连续生成事件,计数器从0开始,递增1,并在totalEvents停止。无法将事件发送到channel时重试。主要用于测试。在重试期间,它保持重试消息的主体与以前相同,这样,在目的地重复数据删除之后,唯一事件的数量预期将等于指定的totalEvents。所需属性以粗体显示。

Property Name Default Description
channels  
type The component type name, needs to be seq
selector.type   replicating or multiplexing
selector.* replicating Depends on the selector.type value
interceptors Space-separated list of interceptors
interceptors.*    
batchSize 1 Number of events to attempt to process per request loop.
totalEvents Long.MAX_VALUE Number of unique events sent by the source.

agent a1 示例:

a1.sources = r1
a1.channels = c1
a1.sources.r1.type = seq
a1.sources.r1.channels = c1

3.6.12  Syslog Sources

读取syslog数据并生成Flume事件。UDP源将整个消息视为单个事件。TCP源为用换行符(' n ')分隔的每个字符串创建一个新事件。

所需属性以粗体显示。

3.6.12.1  Syslog TCP Source

原始的、可靠的syslog TCP源.

Property Name Default Description
channels  
type The component type name, needs to be syslogtcp
host Host name or IP address to bind to
port Port # to bind to
eventSize 2500 Maximum size of a single event line, in bytes
keepFields none Setting this to ‘all’ will preserve the Priority, Timestamp and Hostname in the body of the event. A spaced separated list of fields to include is allowed as well. Currently, the following fields can be included: priority, version, timestamp, hostname. The values ‘true’ and ‘false’ have been deprecated in favor of ‘all’ and ‘none’.
selector.type   replicating or multiplexing
selector.* replicating Depends on the selector.type value
interceptors Space-separated list of interceptors
interceptors.*    

例如,agent a1 syslog TCP source

a1.sources = r1
a1.channels = c1
a1.sources.r1.type = syslogtcp
a1.sources.r1.port = 5140
a1.sources.r1.host = localhost
a1.sources.r1.channels = c1

3.6.12.2 Multiport Syslog TCP Source

这是一个更新、更快、支持多端口的Syslog TCP源版本。请注意,端口配置设置已经替换了端口。多端口功能意味着它可以以一种有效的方式同时监听多个端口。这个源代码使用Apache Mina库来实现这一点。提供对RFC-3164和许多常见的RFC-5424格式消息的支持。还提供按端口配置所使用的字符集的功能。

Property Name Default Description
channels  
type The component type name, needs to be multiport_syslogtcp
host Host name or IP address to bind to.
ports Space-separated list (one or more) of ports to bind to.
eventSize 2500 Maximum size of a single event line, in bytes.
keepFields none Setting this to ‘all’ will preserve the Priority, Timestamp and Hostname in the body of the event. A spaced separated list of fields to include is allowed as well. Currently, the following fields can be included: priority, version, timestamp, hostname. The values ‘true’ and ‘false’ have been deprecated in favor of ‘all’ and ‘none’.
portHeader If specified, the port number will be stored in the header of each event using the header name specified here. This allows for interceptors and channel selectors to customize routing logic based on the incoming port.
charset.default UTF-8 Default character set used while parsing syslog events into strings.
charset.port.<port> Character set is configurable on a per-port basis.
batchSize 100 Maximum number of events to attempt to process per request loop. Using the default is usually fine.
readBufferSize 1024 Size of the internal Mina read buffer. Provided for performance tuning. Using the default is usually fine.
numProcessors (auto-detected) Number of processors available on the system for use while processing messages. Default is to auto-detect # of CPUs using the Java Runtime API. Mina will spawn 2 request-processing threads per detected CPU, which is often reasonable.
selector.type replicating replicating, multiplexing, or custom
selector.* Depends on the selector.type value
interceptors Space-separated list of interceptors.
interceptors.*    

For example, a multiport syslog TCP source for agent named a1:

a1.sources = r1
a1.channels = c1
a1.sources.r1.type = multiport_syslogtcp
a1.sources.r1.channels = c1
a1.sources.r1.host = 0.0.0.0
a1.sources.r1.ports = 10001 10002 10003
a1.sources.r1.portHeader = port

3.6.12.3 Syslog UDP Source

Property Name Default Description
channels  
type The component type name, needs to be syslogudp
host Host name or IP address to bind to
port Port # to bind to
keepFields false Setting this to true will preserve the Priority, Timestamp and Hostname in the body of the event.
selector.type   replicating or multiplexing
selector.* replicating Depends on the selector.type value
interceptors Space-separated list of interceptors
interceptors.*    

For example, a syslog UDP source for agent named a1: 

a1.sources = r1
a1.channels = c1
a1.sources.r1.type = syslogudp
a1.sources.r1.port = 5140
a1.sources.r1.host = localhost
a1.sources.r1.channels = c1

3.6.13 HTTP source

通过HTTP POST和GET接受Flume事件的源。GET应该只用于实验。HTTP请求通过必须实现HTTPSourceHandler接口的可插入“处理程序”转换为flume事件。这个处理程序接受HttpServletRequest并返回一个flume事件列表。从一个Http请求处理的所有事件都提交给一个事务中的通道,从而提高了文件通道等通道的效率。如果处理程序抛出异常,该源将返回400的HTTP状态。如果channel已满,或者源无法向channel追加事件,源将返回一个HTTP 503—暂时不可用状态。

在一个post请求中发送的所有事件都被视为一个批处理,并在一个事务中插入到channel中。

Property Name Default Description
type   The component type name, needs to be http
port The port the source should bind to.
bind 0.0.0.0 The hostname or IP address to listen on
handler org.apache.flume.source.http.JSONHandler The FQCN of the handler class.
handler.* Config parameters for the handler
selector.type replicating replicating or multiplexing
selector.*   Depends on the selector.type value
interceptors Space-separated list of interceptors
interceptors.*    
enableSSL false Set the property true, to enable SSL. HTTP Source does not support SSLv3.
excludeProtocols SSLv3 Space-separated list of SSL/TLS protocols to exclude. SSLv3 is always excluded.
keystore   Location of the keystore includng keystore file name
keystorePassword Keystore password

例如, a http source for agent named a1:

a1.sources = r1
a1.channels = c1
a1.sources.r1.type = http
a1.sources.r1.port = 5140
a1.sources.r1.channels = c1
a1.sources.r1.handler = org.example.rest.RestHandler
a1.sources.r1.handler.nickname = random props

3.6.13.1 JSONHandler

提供了一个开箱即用的处理程序,它可以处理JSON格式表示的事件,并支持UTF-8、UTF-16和UTF-32字符集。处理程序接受事件数组(即使只有一个事件,也必须以数组的形式发送事件),并根据请求中指定的编码将其转换为Flume事件。如果没有指定编码,则假定为UTF-8。JSON处理程序支持UTF-8、UTF-16和UTF-32。事件表示如下:

[{
  "headers" : {
             "timestamp" : "434324343",
             "host" : "random_host.example.com"
             },
  "body" : "random_body"
  },
  {
  "headers" : {
             "namenode" : "namenode.example.com",
             "datanode" : "random_datanode.example.com"
             },
  "body" : "really_random_body"
  }]

要设置字符集,请求必须具有指定为application/json的内容类型;charset=UTF-8(根据需要将UTF-8替换为UTF-16或UTF-32)。

按照此处理程序所期望的格式创建事件的一种方法是使用Flume SDK中提供的JSONEvent,并使用谷歌Gson使用Gson#fromJson(对象、类型)方法创建JSON字符串。传递给事件列表的方法的第二个参数的类型令牌可以通过以下方式创建:

Type type = new TypeToken<List<JSONEvent>>() {}.getType();

3.6.13.2  BlobHandler

默认情况下,HTTPSource将JSON输入拆分为Flume事件。另一种选择是,BlobHandler是HTTPSource的处理程序,它返回一个事件,该事件包含请求参数以及随此请求上载的二进制大对象(BLOB)。例如PDF或JPG文件。注意,这种方法不适用于非常大的对象,因为它在RAM中缓冲整个BLOB。

Property Name Default Description
handler The FQCN of this class: org.apache.flume.sink.solr.morphline.BlobHandler
handler.maxBlobLength 100000000 The maximum number of bytes to read and buffer for a given request

3.6.24 Stress Source

----------未完待续-------------

 

猜你喜欢

转载自www.cnblogs.com/Springmoon-venn/p/10341169.html