1.flume的conf,
# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1
# source
a1.sources.r1.type = netcat
a1.sources.r1.bind= localhost
a1.sources.r1.port = 9999
# Describe the sink
a1.sinks.k1.type = org.apache.spark.streaming.flume.sink.SparkSink
# 运行flume的ip
a1.sinks.k1.hostname = 192.168.25.145
a1.sinks.k1.port = 8888
# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
2.jar包准备
参考官方文档: http://spark.apache.org/docs/latest/streaming-flume-integration.html
当前测试flume使用到的jar包版本如下:
spark-streaming-flume-sink_2.11-2.2.0.jar scala-library-2.11.8.jar commons-lang3-3.5.jar
这几个jar包下载后放到flume安装目录 ./flume/lib/
中。
spark streaming用到的jar版本如下:
spark-streaming-flume-assembly_2.11-2.2.0.jar
启动测试:
hadoop@1:/usr/local/flume$ bin/flume-ng agent -c conf -f conf/flume-spark.conf -n a1 -Dflume.root.logger=DEBUG,console
遇到:
Unsupported major.minor version 52.0 这个错误为jdk版本问题,flume-env.sh中修改为对应版本即可. jdk8-52 jdk7-51
package com.imooc.spark import org.apache.spark.SparkConf import org.apache.spark.streaming.flume.FlumeUtils import org.apache.spark.streaming.{Seconds, StreamingContext} /** * Spark Streaming整合Flume的第二种方式 */ object FlumePullWordCount { def main(args: Array[String]): Unit = { if(args.length != 2) { System.err.println("Usage: FlumePullWordCount <hostname> <port>") System.exit(1) } val Array(hostname, port) = args val sparkConf = new SparkConf() //.setMaster("local[2]").setAppName("FlumePullWordCount") val ssc = new StreamingContext(sparkConf, Seconds(5)) //TODO... 如何使用SparkStreaming整合Flume val flumeStream = FlumeUtils.createPollingStream(ssc, hostname, port.toInt) flumeStream.map(x=> new String(x.event.getBody.array()).trim) .flatMap(_.split(" ")).map((_,1)).reduceByKey(_+_).print() ssc.start() ssc.awaitTermination() } }
报错
> 1,java.lang.IllegalStateException: begin() called when transaction is OPEN!
解决方法:
flume中多出来的scala-library版本,删除非当前的
> 2,no further information flume streaming
解决方法:
flume的连接问题,flume出问题,没有正常运行