一、kafka
------------------------------------------------------------
1.分布式流处理平台
2.在系统或者应用之间,构建实时数据流管道
3.以topic进行分类,对记录进行存储
4.每条记录由key value 和 timestamp构成
5.每秒百万消息的吞吐量
6.服务器上支持消息分区,支持多客户端,实时的,
二、几个名词
--------------------------------------------------------------
producer //消息生产者
customer //消息消费者
customer group //消费者组
kafka server //broker,kafka 服务器
topic //主题:contain 副本数,partitioner infos
三、安装kafka
----------------------------------------------------------------
1.准备zk
略
2.jdk
略
3.下载kafka_2.11-0.10.2.1.tgz
4.tar开,配置环境变量,创建符号链接
5.配置kafka [soft/kafka/config/server.properties]
broker.id=100
listeners=PLAINTEXT://:9092
log.dirs=/home/ubuntu/kafka/logs
zookeeper.connect=s100:2181,s200:2181,s300:2181
6.分发到其他kafka机器上[s200,s300,s400]
更改配置文件的broker.id= 200 / 300 /400
7.启动kafka服务器
a.启动zk
b.启动kafka[s200,s300,s400]
$> /soft/kafka/bin/kafka-server-start.sh /soft/kafka/config/server.properties
c.验证kafka是否启动成功
$> netstat -ano | grep 9092
8.创建主题
$>bin/kafka-topics.sh --create --zookeeper s100:2181 --replication-factor 3 --partitions 3 --topic tstopic
9.查看主题
$>bin/kafka-topics.sh --list --zookeeper s100:2181
10.启动控制台生产者,发送消息[测试用]
$>bin/kafka-console-producer.sh --broker-list s200:9092 --topic tstopic
11.启动控制台消费者
$>bin/kafka-console-consumer.sh --bootstrap-server s200:9092 --topic tstopic --from-beginning --zookeeper s100:2181
[如果出现:consumer zookeeper is not a recognized option,请使用下面命令开启消费者]
$>bin/kafka-console-consumer.sh --bootstrap-server s200:9092 --topic tstopic --from-beginning
12.在生产者控制台输入hello world
四、kafka集群在zk上的配置属性
-----------------------------------------------------------------------
/controller ===> {"version":1,"brokerid":202,"timestamp":"1490926369148"
/controller_epoch ===> 1
/brokers
/brokers/ids
/brokers/ids/202 ===> {"jmx_port":-1,"timestamp":"1490926370304","endpoints":["PLAINTEXT://s202:9092"],"host":"s202","version":3,"port":9092}
/brokers/ids/203
/brokers/ids/204
/brokers/topics/test/partitions/0/state ===>{"controller_epoch":1,"leader":203,"version":1,"leader_epoch":0,"isr":[203,204,202]}
/brokers/topics/test/partitions/1/state ===>...
/brokers/topics/test/partitions/2/state ===>...
/brokers/seqid ===> null
/admin
/admin/delete_topics/test ===>标记删除的主题
/isr_change_notification
/consumers/xxxx/
/config
五、创建主题
------------------------------------------------------------------------------
1.--replication-factor 副本数:决定一个主题存放多少个副本
2.--partitions 分区数:决定一个主题分成多少块存放到不同的主机上
3.一个主题 = 副本数 * 分区数
4.创建主题,并且手动指定存放的服务器
六、手动再平衡
---------------------------------------------------------------------------------
1.replica-assignment
$> kafka-topics.sh --create --zookeeper s200:2181 --topic ts01 --replica-assignment 203:204,203:204,203:204,203:204,203:204
七、副本
-----------------------------------------------------------------------------------
1.消息的生产者决定消息的分区方式
2.kafka服务器broker中根据生产者决定的方式进行分区,分区的数量在broker中是可配置的
3.broker存放消息的顺序与消息到达的顺序保持一致
4.生产和消费都是副本感知的,这就确保了在broker故障的时候,仍然可以继续发布和消费消息
5.n个节点支持n-1故障。
6.每个分区都有一个leader和若干follower
7.leader 挂掉的时候,消息分区会将消息写入本地log,然后选出新的leader,然后向生产者发送确认回执,生产者继而向新的leader发送消息
8.新leader的选举是通过isr进行的,第一个注册的follower成为leader
八、副本模式 -- 同步副本和异步副本
---------------------------------------------------------------------------------
1.同步模式
a.生产者从zk中找到leader的分区,然后发布消息。leader得到消息,并将消息写入到自己的本地log中。
b.然后通知所有的follower,使用同一个channel,开始pull消息以确保消息的顺序性
c.当follower同步消息之后,发送确认信息给leader
d.当所有的follower都确认之后,leader发送消息给生产者,告知消息已经处理完毕
2.异步模式
a.生产者从zk中找到leader的分区,然后发布消息。
b.leader得到消息,并将消息写入到自己的本地log中。之后,通知follower开始pull消息,并立刻向生产者发送确认消息,而不等待follower的确认过程
c.所以,该模式无法保证broker故障时,消息分发的正常工作
九、API演示
-------------------------------------------------------------------------------
1.kafka消息生产者,发送消息
2.kafka消息消费者,消费消息
import kafka.consumer.Consumer;
import kafka.consumer.ConsumerConfig;
import kafka.message.MessageAndMetadata;
import org.junit.Test;
import kafka.javaapi.producer.Producer;
import kafka.producer.KeyedMessage;
import kafka.producer.ProducerConfig;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.Properties;
import kafka.consumer.Consumer;
import kafka.consumer.ConsumerConfig;
import kafka.consumer.ConsumerIterator;
import kafka.consumer.KafkaStream;
import kafka.javaapi.consumer.ConsumerConnector;
import java.util.Properties;
/**
* 测试kafka Api
*/
public class TestProducer {
/**
* 测试消息生产者
*/
@Test
public void teProducer()
{
Properties props = new Properties();
//broker列表
props.put("metadata.broker.list","s200:9092");
//串行化
props.put("serializer.class","kafka.serializer.StringEncoder");
//同步模式
props.put("request.required.acks", "1");
//创建kafka配置
ProducerConfig config = new ProducerConfig(props);
//根据配置文件创建生产者
Producer<String, String> producer = new Producer<String, String>(config);
KeyedMessage<String, String> msg = new KeyedMessage<String, String>("test","100","hello world tom567");
producer.send(msg);
System.out.println("send over !");
}
/**
* 测试消息消费者
*/
@Test
public void tsConsumer()
{
Properties props = new Properties();
//确定zk
props.put("zookeeper.connect", "s100:2181");
//确定组
props.put("group.id", "g11");
props.put("zookeeper.session.timeout.ms", "500");
props.put("zookeeper.sync.time.ms", "250");
props.put("auto.commit.interval.ms", "1000");
//创建消费者配置文件
ConsumerConfig config = new ConsumerConfig(props);
//创建消费者
ConsumerConnector consumer = Consumer.createJavaConsumerConnector(new ConsumerConfig(props));
//绑定主题
String topic = "test";
Map<String, Integer> map = new HashMap<String, Integer>();
map.put(topic, new Integer(1));
//开始消费
Map<String, List<KafkaStream<byte[], byte[]>>> kafkaMsg = consumer.createMessageStreams(map);
List<KafkaStream<byte[], byte[]>> msgList = kafkaMsg.get("test");
for(KafkaStream<byte[],byte[]> msg : msgList)
{
ConsumerIterator<byte[],byte[]> mm = msg.iterator();
while (mm.hasNext()) {
MessageAndMetadata<byte[], byte[]> next = mm.next();
byte [] m = next.message();
System.out.println(new String(m));
}
}
}
}
十、flume集成kafka
-------------------------------------------------------------------------------
1.kafka作为flume的sink [KafkaSink]
flume是kafka的消息的生产者
[/soft/flume/conf/KafkaSink.conf]
a1.sources = r1
a1.sinks = k1
a1.channels = c1
a1.sources.r1.type=netcat
a1.sources.r1.bind=localhost
a1.sources.r1.port=8888
a1.sinks.k1.type = org.apache.flume.sink.kafka.KafkaSink
a1.sinks.k1.kafka.topic = test
a1.sinks.k1.kafka.bootstrap.servers = s200:9092
a1.sinks.k1.kafka.flumeBatchSize = 20
a1.sinks.k1.kafka.producer.acks = 1
a1.channels.c1.type=memory
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
2.kafka作为flume的source [kafkaSource]
flume是kafka的消息的消费者
[/soft/flume/conf/KafkaSource.conf]
a1.sources = r1
a1.sinks = k1
a1.channels = c1
a1.sources.r1.type = org.apache.flume.source.kafka.KafkaSource
a1.sources.r1.batchSize = 5000
a1.sources.r1.batchDurationMillis = 2000
a1.sources.r1.kafka.bootstrap.servers = s200:9092
a1.sources.r1.kafka.topics = test
a1.sources.r1.kafka.consumer.group.id = g1
a1.sinks.k1.type = logger
a1.channels.c1.type=memory
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
3.kafka作为flume的channel [KafkaChannel]
flume是kafka的消息的生产者 + 消费者
[/soft/flume/conf/KafkaChannel.conf]
a1.sources = r1
a1.sinks = k1
a1.channels = c1
a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost
a1.sources.r1.port = 8888
a1.sinks.k1.type = logger
a1.channels.c1.type = org.apache.flume.channel.kafka.KafkaChannel
a1.channels.c1.kafka.bootstrap.servers = s200:9092
a1.channels.c1.kafka.topic = test
a1.channels.c1.kafka.consumer.group.id = g1
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1