相关概念术语
- Broker:Kafka 集群包含一个或多个服务器,这种服务器被称为 broker。
- Topic:每条发布到 Kafka 集群的消息都有一个类别,称为Topic。
- Partition:物理上的概念,每个 Topic 包含一个或多个 Partition。
- Producer:负责发布消息到 Kafka broker。
- Consumer:向 Kafka broker 读取消息的客户端。
- Consumer Group:每个 Consumer 属于一个特定的 Consumer Group(可为每个 Consumer 指定 group name,若不指定 group name 则属于默认的 group)。
说明(基于新版):
- 在集群环境下,其中的一个broker会作为集群的controller,通过集群中处于活跃状态的brokers选举产生,类似管理者的角色,给brokers分配partions,监视brokers等;
- 在集群环境下,一个partition属于一个单独的broker,对于这个partition而言,这个broker是leader。这个partition可能会被分配给其他的borkers作为备份(replicate)。这样当一个partition的leader(broker) faliure,其他的拥有这个partition的broker会接管它作为leader;
- 消费者和生产者对一个partition的操作都是通过它的leader进行;
- 可以配置brokers保存topic或messages的期限,如一段时间或者topic的容量达到一个值,满足这些条件后,messages就会被删除;也可以对具体的topic配置有效期。
Standalone模式
适合本地开发或者一些概念的验证;
安装JAVA
在安装zookeeper或kafka之前,先确定系统是否已经有java环境,详情略。信息如下:
C:\Users\Administrator>java -version
java version "1.8.0_151"
Java(TM) SE Runtime Environment (build 1.8.0_151-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.151-b12, mixed mode)
安装zookeeper
Zookeeper存储Broker和Topic的元数据信息或消费者客户端的细节(old version)。可以运行kafka包含的zookeeper脚本启动zookeeper server。这里从阿里云开源镜像站下载安装包zookeeper-3.4.6.tar.gz (所在网络访问不了apache),信息如下:
[root@localhost zookeeper1]# pwd
/usr/local/software/zookeeper/zookeeper1
[root@localhost zookeeper1]# ll
total 1612
drwxr-xr-x. 2 502 games 4096 Feb 2 01:06 bin
-rw-r--r--. 1 502 games 87943 Nov 1 11:47 build.xml
drwxr-xr-x. 2 502 games 104 Feb 6 04:20 conf
drwxr-xr-x. 10 502 games 4096 Nov 1 11:47 contrib
drwxr-xr-x. 3 root root 60 Feb 6 04:22 data
drwxr-xr-x. 2 502 games 4096 Nov 1 11:54 dist-maven
drwxr-xr-x. 6 502 games 4096 Nov 1 11:52 docs
-rw-r--r--. 1 502 games 1709 Nov 1 11:47 ivysettings.xml
-rw-r--r--. 1 502 games 8197 Nov 1 11:47 ivy.xml
drwxr-xr-x. 4 502 games 4096 Nov 1 11:52 lib
-rw-r--r--. 1 502 games 11938 Nov 1 11:47 LICENSE.txt
drwxr-xr-x. 3 root root 22 Feb 2 01:00 logs
-rw-r--r--. 1 502 games 3132 Nov 1 11:47 NOTICE.txt
-rw-r--r--. 1 502 games 1585 Nov 1 11:47 README.md
-rw-r--r--. 1 502 games 1770 Nov 1 11:47 README_packaging.txt
drwxr-xr-x. 5 502 games 44 Nov 1 11:47 recipes
drwxr-xr-x. 8 502 games 4096 Nov 1 11:52 src
drwxr-xr-x. 2 root root 6 Feb 2 01:00 version-2
-rw-r--r--. 1 502 games 1478279 Nov 1 11:49 zookeeper-3.4.11.jar
-rw-r--r--. 1 502 games 195 Nov 1 11:52 zookeeper-3.4.11.jar.asc
-rw-r--r--. 1 502 games 33 Nov 1 11:49 zookeeper-3.4.11.jar.md5
-rw-r--r--. 1 502 games 41 Nov 1 11:49 zookeeper-3.4.11.jar.sha1
-rw-r--r--. 1 root root 5 Feb 6 02:33 zookeeper_server.pid
[root@localhost conf]# ll /usr/local/software/zookeeper/zookeeper1/conf
total 20
-rw-r--r--. 1 502 games 535 Nov 1 11:47 configuration.xsl
-rw-r--r--. 1 502 games 2161 Nov 1 11:47 log4j.properties
-rw-r--r--. 1 root root 922 Feb 5 22:39 zoo22.cfg
-rw-r--r--. 1 root root 1323 Feb 6 04:20 zoo.cfg
-rw-r--r--. 1 502 games 922 Nov 1 11:47 zoo_sample.cfg
dataDir配置数据存储的目录, clientPort默认的2181;zoo.cfg如下:
# The number of milliseconds of each tick
ot@localhost conf]# vi zoo.cfg
# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just
# example sakes.
dataDir=/usr/local/software/zookeeper/zookeeper-3.4.11
#log info
dataLogDir=/usr/local/software/zookeeper/zookeeper-3.4.11/logs
# the port at which the clients will connect
clientPort=2181
# the maximum number of client connections.
# increase this if you need to handle more clients
#maxClientCnxns=60
#
# Be sure to read the maintenance section of the
# administrator guide before turning on autopurge.
#
# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
#
# The number of snapshots to retain in dataDir
#autopurge.snapRetainCount=3
# Purge task interval in hours
# Set to "0" to disable auto purge feature
#autopurge.purgeInterval=1
Kafka介绍
Apache Kafka是一个分布式发布 - 订阅消息系统和一个强大的队列,可以处理大量的数据,并使您能够将消息从一个端点传递到另一个端点。 Kafka适合离线和在线消息消费。 Kafka消息保留在磁盘上,并在群集内复制以防止数据丢失。 Kafka构建在ZooKeeper同步服务之上。 它与Apache Storm和Spark非常好地集成,用于实时流式数据分析。
Kafka是一种分布式消息系统,由LinkedIn使用Scala编写,用作LinkedIn的活动流(Activity Stream)和运营数据处理管道(Pipeline)的基础,具有高水平扩展和高吞吐量。
目前越来越多的开源分布式处理系统如Apache flume、Apache Storm、Spark、Elasticsearch都支持与Kafka集成。下载解压kafka,版本是kafka_2.11-0.9.0.1
kafka 安装与配置
下载地址:http://kafka.apache.org/downloads
下载并解压
[root@log1 local]# wget https://archive.apache.org/dist/kafka/0.9.0.1/kafka_2.11-0.9.0.1.tgz
[root@log1 local]# tar zxvf kafka_2.11-0.9.0.0.tgz
启动zookeeper
/usr/local/software/zookeeper/zookeeper1/bin/zkServer.sh start
/usr/local/software/zookeeper/zookeeper2/bin/zkServer.sh start
/usr/local/software/zookeeper/zookeeper3/bin/zkServer.sh start
修改配置文件server.properties
17 ############################# Server Basics #############################
18
19 # The id of the broker. This must be set to a unique integer for each broker.
20 broker.id=1
21
22 ############################# Socket Server Settings #############################
23
24 listeners=PLAINTEXT://:9092
25
40
41 # The number of threads handling network requests
42 num.network.threads=3
43
44 # The number of threads doing disk I/O
45 num.io.threads=8
46
47 # The send buffer (SO_SNDBUF) used by the socket server
48 socket.send.buffer.bytes=102400
49
50 # The receive buffer (SO_RCVBUF) used by the socket server
51 socket.receive.buffer.bytes=102400
52
53 # The maximum size of a request that the socket server will accept (protection against OOM)
54 socket.request.max.bytes=104857600
55
56
57 ############################# Log Basics #############################
58
59 # A comma seperated list of directories under which to store log files
60 log.dirs=/tmp/kafka-logs
61
62 # The default number of log partitions per topic. More partitions allow greater
63 # parallelism for consumption, but this will also result in more files across
64 # the brokers.
65 num.partitions=1
66
67 # The number of threads per data directory to be used for log recovery at startup and flushing at shutdown.
68 # This value is recommended to be increased for installations with data dirs located in RAID array.
69 num.recovery.threads.per.data.dir=1
70
71 ############################# Log Flush Policy #############################
81
82 # The number of messages to accept before forcing a flush of data to disk
83 #log.flush.interval.messages=10000
84
85 # The maximum amount of time a message can sit in a log before we force a flush
86 #log.flush.interval.ms=1000
87
88 ############################# Log Retention Policy #############################
89
94
95 # The minimum age of a log file to be eligible for deletion
96 log.retention.hours=168
97
98 # A size-based retention policy for logs. Segments are pruned from the log as long as the remaining
99 # segments don't drop below log.retention.bytes.
100 #log.retention.bytes=1073741824
101
102 # The maximum size of a log segment file. When this size is reached a new log segment will be created.
103 log.segment.bytes=1073741824
104
105 # The interval at which log segments are checked to see if they can be deleted according
106 # to the retention policies
107 log.retention.check.interval.ms=300000
108
109 # By default the log cleaner is disabled and the log retention policy will default to just delete segments after their retention expires.
110 # If log.cleaner.enable=true is set the cleaner will be enabled and individual logs can then be marked for log compaction.
111 log.cleaner.enable=false
112
113 ############################# Zookeeper #############################
114
115 # Zookeeper connection string (see zookeeper docs for details).
116 # This is a comma separated host:port pairs, each corresponding to a zk
117 # server. e.g. "127.0.0.1:3000,127.0.0.1:3001,127.0.0.1:3002".
118 # You can also append an optional chroot string to the urls to specify the
119 # root directory for all kafka znodes.
120 zookeeper.connect=127.0.0.1:2181,127.0.0.1:2182,127.0.0.1:2183
121
122 # Timeout in ms for connecting to zookeeper
123 zookeeper.connection.timeout.ms=6000
启动kafka server
bin/kafka-server-start.sh config/server.properties
(非本地生产者和消费者访问Kafka,记得修改 config/server.properties中的listeners, 例如
listeners=PLAINTEXT://192.168.33.152:9092)
create a topic
bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test
查看topic
bin/kafka-topics.sh --list --zookeeper localhost:2181
参考资源:
1. http://blog.csdn.net/z88897050/article/details/53893583#t1
2. https://www.cnblogs.com/lanyangsh/p/7782795.html
3. https://www.w3cschool.cn/apache_kafka/apache_kafka_quick_guide.html