kafka 随记

1.查看版本兼容问题
https://www.cloudera.com/documentation/enterprise/release-notes/topics/rn_consolidated_pcm.html#pcm_kafka

2.下载CSD包KAFKA-1.2.0.jar
http://archive.cloudera.com/csds/kafka/
上传到/opt/cloudera/csd 目录下

3.下载KAFKA-3.0.0-1.3.0.0.p0.40-el6.parcel 和 KAFKA-3.0.0-1.3.0.0.p0.40-el6.parcel.sha1、manifest.json
http://archive.cloudera.com/kafka/parcels/latest/
上传到/opt/cloudera/parcel-repo 目录下

##创建topic replication-factor 副本数
kafka-topics --create --topic kafka-test --zookeeper host0:2181,host2:2181,host7:2181 --partitions 1 --replication-factor 1
#列出topic
kafka-topics --list --zookeeper host0:2181,host2:2181,host7:2181
#查看topic
kafka-topics --zookeeper host0:2181,host2:2181,host7:2181 --topic "direct_test" --describe
#启动kafka 生产者
kafka-console-producer --broker-list host0:9092,host2:9092,host7:9092 --topic direct_test
##启动消费者
kafka-console-consumer --zookeeper host0:2181,host2:2181,host7:2181 --topic direct_test
##删除topic
kafka-topics --delete --zookeeper host0:2181,host2:2181,host7:2181 --topic direct_test
##kafka 修改分区数量
kafka-topics --alter --zookeeper host0:2181,host2:2181,host7:2181 --partitions 5 --topic flink_hbase
kafka-topics --alter --zookeeper host0:2181,host2:2181,host7:2181 --replication-factor 3 --topic flink_hbase
##查询offset的最小值：
kafka-run-class kafka.tools.GetOffsetShell --broker-list host0:9092,host2:9092,host7:9092 -topic flink_hbase --time -2
##查询offset的最大值：
kafka-run-class kafka.tools.GetOffsetShell --broker-list host0:9092,host2:9092,host7:9092 -topic flink_hbase --time -1
##kafka查看topic各个分区的消息的信息
kafka-run-class kafka.tools.ConsumerOffsetChecker --group client_kafka-test_0 --topic kafka-test --zookeeper host0:2181,host2:2181,host7:2181/
##删除group
ls /consumers
rmr /consumers/group1

通过下面命令设置consumer group:DynamicRangeGroup topic:DynamicRange partition:0的offset为1288:
set /consumers/DynamicRangeGroup/offsets/DynamicRange/0 1288

spark-shell --master local[2] -jars /root/tanj/kafka_2.11-0.8.2.1.jar,/root/tanj/spark-streaming-kafka_2.11-1.6.0.jar

import org.apache.spark.SparkConf
import kafka.serializer.StringDecoder
import org.apache.spark.streaming.kafka.KafkaUtils
import org.apache.spark.streaming.{Durations, StreamingContext}

val ssc = new StreamingContext(sc, Durations.seconds(5))
KafkaUtils.createDirectStream[String, String, StringDecoder, StringDecoder](ssc
, Map("bootstrap.servers" -> "host0:2181,host2:2181,host7:2181"
, "metadata.broker.list" -> "host0:9092,host2:9092,host7:9092"
, "group.id" -> "StreamingWordCountSelfKafkaDirectStreamScala")
, Set("direct_test")).map(t => t._2).flatMap(_.toString.split(" ")).map((_, 1)).reduceByKey(_ + _).print()
ssc.start()

学习资料：
http://blog.csdn.net/sun_qiangwei/article/details/52080147#t2
http://blog.csdn.net/qq_21234493/article/details/51340138

猜你喜欢