Kafka集群构建——伪分布式

Kafka集群构建

前言：本次构建为伪分布式即在同一台服务器的不同端口上启动多个服务，构成多个“伪节点”来达到伪分布式目的，区别与完全分布式的一台机器一个节点

环境要求：
centOs 7服务器一台，
jdk 1.8，
zookeeper-3.4.12
kafka-1.1.1

注意：

不要忘记配置 hosts（命令 vim /etc/hosts）例如：192.168.1.100 pc1，
不要忘记开启以下操作用到的端口或者关闭防火墙（CentOs7不推荐关闭）

1、Kafka的定义

What is Kafka：它是一个分布式消息系统，由linkedin使用scala编写，用作LinkedIn的活动流（Activity Stream）和运营数据处理管道（Pipeline）的基础。具有高水平扩展和高吞吐量。

2、Kafka基于zookeeper

Kafka集群是把状态保存在Zookeeper中的，因此在构建Kafka集群时，首先要构建zookeeper集群

Zookeeper集群构建

#我的zookeeper目录统一放在/opt下面
#首先创建Zookeeper根目录
cd /opt
mkdir /opt/zookeeper #根目录
#进入zookeeper文件夹创建节点1 节点2 节点3 所需的文件夹
cd ./zookeeper
mkdir zoo_1 #存放节点1 的快照日志
mkdir zoo_1_dataLog #存放节点1 的事物日志

mkdir zoo_2 #存放节点2 的快照日志
mkdir zoo_2_dataLog #存放节点2 的事物日志

mkdir zoo_3 #存放节点3 的快照日志
mkdir zoo_3_dataLog #存放节点3 的事物日志

下载zookeeper-3.4.12.tar.gz
将zookeeper-3.4.12.tar.gz 上传到 /opt/zookeeper 下，并解压。
tar -zxvf zookeeper-3.4.12.tar.gz
解压后的层次结构如图所示：
伪分布式zookeeper目录结构
创建myid文件

#server1
echo "1" > /opt/zookeeper/zoo_1/myid
#server2
echo "2" > /opt/zookeeper/zoo_2/myid
#server3
echo "3" > /opt/zookeeper/zoo_3/myid

进入/zookeeper-3.4.12/conf
创建三个节点的配置文件 zoo1.cfg zoo2.cfg zoo3.cfg

cp zoo_sample.cfg ./zoo1.cfg
cp zoo_sample.cfg ./zoo2.cfg
cp zoo_sample.cfg ./zoo3.cfg

zoo1.cfg 的配置如下：

#发送心跳的间隔时间，单位：毫秒
tickTime=2000
#zookeeper保存数据的目录
dataDir=/opt/zookeeper/zoo_1
#日志目录
dataLogDir=/opt/zookeeper/zoo_1_dataLog
#端口
clientPort=2181
#leader和follower初始化连接时最长能忍受多少个心跳时间的间隔数
initLimit=5
#leader和follower之间发送消息，请求和英达时间长度，最长不能超过多少个tickTime的时间长度
syncLimit=2
#zookeeper机器列表，server.order这里的Order依据集群的机器个数依次进行递增，这里的server1、server2、server3表示机器IP地址
server.1=pc1:2887:3887
server.2=pc1:2888:3888
server.3=pc1:2889:3889

zoo2.cfg 的配置如下：

#发送心跳的间隔时间，单位：毫秒
tickTime=2000
#zookeeper保存数据的目录
dataDir=/opt/zookeeper/zoo_2
#日志目录
dataLogDir=/opt/zookeeper/zoo_2_dataLog
#端口
clientPort=2182
#leader和follower初始化连接时最长能忍受多少个心跳时间的间隔数
initLimit=5
#leader和follower之间发送消息，请求和英达时间长度，最长不能超过多少个tickTime的时间长度
syncLimit=2
#zookeeper机器列表，server.order这里的Order依据集群的机器个数依次进行递增，这里的server1、server2、server3表示机器IP地址
server.1=pc1:2887:3887
server.2=pc1:2888:3888
server.3=pc1:2889:3889

zoo3.cfg 的配置如下：

#发送心跳的间隔时间，单位：毫秒
tickTime=2000
#zookeeper保存数据的目录
dataDir=/opt/zookeeper/zoo_3
#日志目录
dataLogDir=/opt/zookeeper/zoo_3_dataLog
#端口
clientPort=2183
#leader和follower初始化连接时最长能忍受多少个心跳时间的间隔数
initLimit=5
#leader和follower之间发送消息，请求和英达时间长度，最长不能超过多少个tickTime的时间长度
syncLimit=2
#zookeeper机器列表，server.order这里的Order依据集群的机器个数依次进行递增，这里的server1、server2、server3表示机器IP地址
server.1=pc1:2887:3887
server.2=pc1:2888:3888
server.3=pc1:2889:3889

配置解释：

#tickTime：
这个时间是作为 Zookeeper 服务器之间或客户端与服务器之间维持心跳的时间间隔，也就是每个 tickTime 时间就会发送一个心跳。
#initLimit：
这个配置项是用来配置 Zookeeper 接受客户端（这里所说的客户端不是用户连接 Zookeeper 服务器的客户端，而是 Zookeeper 服务器集群中连接到 Leader 的 Follower 服务器）初始化连接时最长能忍受多少个心跳时间间隔数。当已经超过 5个心跳的时间（也就是 tickTime）长度后 Zookeeper 服务器还没有收到客户端的返回信息，那么表明这个客户端连接失败。总的时间长度就是 5*2000=10 秒
#syncLimit：
这个配置项标识 Leader 与Follower 之间发送消息，请求和应答时间长度，最长不能超过多少个 tickTime 的时间长度，总的时间长度就是5*2000=10秒
#dataDir：
快照日志的存储路径
#dataLogDir：
事物日志的存储路径，如果不配置这个那么事物日志会默认存储到dataDir制定的目录，这样会严重影响zk的性能，当zk吞吐量较大的时候，产生的事物日志、快照日志太多
#clientPort：
这个端口就是客户端连接 Zookeeper 服务器的端口，Zookeeper 会监听这个端口，接受客户端的访问请求。修改他的端口改大点

重要配置说明：
1、myid文件和server.myid 在快照目录下存放的标识本台服务器的文件，他是整个zk集群用来发现彼此的一个重要标识。
2、zoo.cfg 文件是zookeeper配置文件zoo_simple.cfg 复制而来，在conf目录里。
3、log4j.properties文件是zk的日志输出文件在conf目录里用java写的程序基本上有个共同点日志都用log4j，来进行管理。

启动zookeeper

cd /opt/zookeeper/zookeeper-3.4.12
bin/zkServer.sh start zoo1.cfg
bin/zkServer.sh start zoo2.cfg
bin/zkServer.sh start zoo3.cfg

可以用“jps”查看zk的进程，这个是zk的整个工程的main

#执行命令jps
13571 Jps
7464 QuorumPeerMain
7515 QuorumPeerMain 
7581 QuorumPeerMain

说明：
zk集群一般只有一个leader，多个follower，主一般是相应客户端的读写请求，而从主同步数据，当主挂掉之后就会从follower里投票选举一个leader出来。
zk遵循半数规则，奇数规则，第一次启动时zk会根据网络延迟的大小来选举出leader。

查看zookeeper状态

#检查服务器状态
bin/zkServer.sh status

通过status就能看到状态

bin/zkServer.sh status
JMX enabled by default
Using config: /opt/zookeeper/zookeeper-3.4.12/bin/../conf/zoo.cfg  #配置文件
Mode: follower  #他是否为领导

Kafka集群搭建

软件环境
1、linux一台或多台，本次为一台机器启动三个kfk服务构成“伪分布式”
2、已经搭建好的zookeeper集群
3、软件版本kafka_2.11-1.1.1.tgz，下载
Binary downloads，直接可以用；
Source download 则需要scala编译
如图所示：
在这里插入图片描述
创建kafka根目录并上传kafka包并解压

#创建目录
cd /opt/
mkdir kafka #创建项目目录
cd kafka
mkdir kafkalog_1 #创建kafka消息目录，主要存放kafka 1节点消息
mkdir kafkalog_2 #创建kafka消息目录，主要存放kafka 2节点消息
mkdir kafkalog_3 #创建kafka消息目录，主要存放kafka 3节点消息

#上传kafka_2.11-1.1.1.tgz  到 /opt/kafka下 并解压
tar -zxvf kafka_2.11-1.1.1.tgz

如图所示:
在这里插入图片描述
修改配置文件

cd /opt/kafka/kafka_2.11-1.1.1/config

主要关注：server.properties 这个文件即可，我们可以发现在目录下：
有很多文件，这里可以发现有Zookeeper文件，我们可以根据Kafka内带的zk集群来启动，但是建议使用独立的zk集群（我们上面已经搭好的 zookeeper集群）

接下来通过复制server.properties配置文件，创造出我们伪分布式3个节点所需的配置文件server_1.properties , server_2.properties , server_3.properties

cp server.properties ./server_1.properties 
cp server.properties ./server_2.properties
cp server.properties ./server_3.properties

我的配置：
server_1.properties

# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# see kafka.server.KafkaConfig for additional details and defaults

############################# Server Basics #############################

# The id of the broker. This must be set to a unique integer for each broker.
broker.id=1
port=9092
############################# Socket Server Settings #############################

# The address the socket server listens on. It will get the value returned from 
# java.net.InetAddress.getCanonicalHostName() if not configured.
#   FORMAT:
#     listeners = listener_name://host_name:port
#   EXAMPLE:
#     listeners = PLAINTEXT://your.host.name:9092
listeners=PLAINTEXT://pc1:9092

# Hostname and port the broker will advertise to producers and consumers. If not set, 
# it uses the value for "listeners" if configured.  Otherwise, it will use the value
# returned from java.net.InetAddress.getCanonicalHostName().
#advertised.listeners=PLAINTEXT://your.host.name:9092

# Maps listener names to security protocols, the default is for them to be the same. See the config documentation for more details
#listener.security.protocol.map=PLAINTEXT:PLAINTEXT,SSL:SSL,SASL_PLAINTEXT:SASL_PLAINTEXT,SASL_SSL:SASL_SSL

# The number of threads that the server uses for receiving requests from the network and sending responses to the network
num.network.threads=3

# The number of threads that the server uses for processing requests, which may include disk I/O
num.io.threads=8

# The send buffer (SO_SNDBUF) used by the socket server
socket.send.buffer.bytes=102400

# The receive buffer (SO_RCVBUF) used by the socket server
socket.receive.buffer.bytes=102400

# The maximum size of a request that the socket server will accept (protection against OOM)
socket.request.max.bytes=104857600


############################# Log Basics #############################

# A comma separated list of directories under which to store log files
log.dirs=/opt/kafka/kafkalog_1


# The default number of log partitions per topic. More partitions allow greater
# parallelism for consumption, but this will also result in more files across
# the brokers.
num.partitions=1

# The number of threads per data directory to be used for log recovery at startup and flushing at shutdown.
# This value is recommended to be increased for installations with data dirs located in RAID array.
num.recovery.threads.per.data.dir=1

############################# Internal Topic Settings  #############################
# The replication factor for the group metadata internal topics "__consumer_offsets" and "__transaction_state"
# For anything other than development testing, a value greater than 1 is recommended for to ensure availability such as 3.
offsets.topic.replication.factor=1
transaction.state.log.replication.factor=1
transaction.state.log.min.isr=1

############################# Log Flush Policy #############################

# Messages are immediately written to the filesystem but by default we only fsync() to sync
# the OS cache lazily. The following configurations control the flush of data to disk.
# There are a few important trade-offs here:
#    1. Durability: Unflushed data may be lost if you are not using replication.
#    2. Latency: Very large flush intervals may lead to latency spikes when the flush does occur as there will be a lot of data to flush.
#    3. Throughput: The flush is generally the most expensive operation, and a small flush interval may lead to excessive seeks.
# The settings below allow one to configure the flush policy to flush data after a period of time or
# every N messages (or both). This can be done globally and overridden on a per-topic basis.

# The number of messages to accept before forcing a flush of data to disk
#log.flush.interval.messages=10000

# The maximum amount of time a message can sit in a log before we force a flush
#log.flush.interval.ms=1000

############################# Log Retention Policy #############################

# The following configurations control the disposal of log segments. The policy can
# be set to delete segments after a period of time, or after a given size has accumulated.
# A segment will be deleted whenever *either* of these criteria are met. Deletion always happens
# from the end of the log.

# The minimum age of a log file to be eligible for deletion due to age
log.retention.hours=168

# A size-based retention policy for logs. Segments are pruned from the log unless the remaining
# segments drop below log.retention.bytes. Functions independently of log.retention.hours.
#log.retention.bytes=1073741824

# The maximum size of a log segment file. When this size is reached a new log segment will be created.
log.segment.bytes=1073741824

# The interval at which log segments are checked to see if they can be deleted according
# to the retention policies
log.retention.check.interval.ms=300000

############################# Zookeeper #############################

# Zookeeper connection string (see zookeeper docs for details).
# This is a comma separated host:port pairs, each corresponding to a zk
# server. e.g. "127.0.0.1:3000,127.0.0.1:3001,127.0.0.1:3002".
# You can also append an optional chroot string to the urls to specify the
# root directory for all kafka znodes.
zookeeper.connect=pc1:2181,pc1:2182,pc1:2183 

# Timeout in ms for connecting to zookeeper
zookeeper.connection.timeout.ms=6000


############################# Group Coordinator Settings #############################

# The following configuration specifies the time, in milliseconds, that the GroupCoordinator will delay the initial consumer rebalance.
# The rebalance will be further delayed by the value of group.initial.rebalance.delay.ms as new members join the group, up to a maximum of max.poll.interval.ms.
# The default value for this is 3 seconds.
# We override this to 0 here as it makes for a better out-of-the-box experience for development and testing.
# However, in production environments the default value of 3 seconds is more suitable as this will help to avoid unnecessary, and potentially expensive, rebalances during application startup.
group.initial.rebalance.delay.ms=0

我的配置：
server_2.properties

# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# see kafka.server.KafkaConfig for additional details and defaults

############################# Server Basics #############################

# The id of the broker. This must be set to a unique integer for each broker.
broker.id=2
port=9093
############################# Socket Server Settings #############################

# The address the socket server listens on. It will get the value returned from 
# java.net.InetAddress.getCanonicalHostName() if not configured.
#   FORMAT:
#     listeners = listener_name://host_name:port
#   EXAMPLE:
#     listeners = PLAINTEXT://your.host.name:9092
listeners=PLAINTEXT://pc1:9093

# Hostname and port the broker will advertise to producers and consumers. If not set, 
# it uses the value for "listeners" if configured.  Otherwise, it will use the value
# returned from java.net.InetAddress.getCanonicalHostName().
#advertised.listeners=PLAINTEXT://your.host.name:9092

# Maps listener names to security protocols, the default is for them to be the same. See the config documentation for more details
#listener.security.protocol.map=PLAINTEXT:PLAINTEXT,SSL:SSL,SASL_PLAINTEXT:SASL_PLAINTEXT,SASL_SSL:SASL_SSL

# The number of threads that the server uses for receiving requests from the network and sending responses to the network
num.network.threads=3

# The number of threads that the server uses for processing requests, which may include disk I/O
num.io.threads=8

# The send buffer (SO_SNDBUF) used by the socket server
socket.send.buffer.bytes=102400

# The receive buffer (SO_RCVBUF) used by the socket server
socket.receive.buffer.bytes=102400

# The maximum size of a request that the socket server will accept (protection against OOM)
socket.request.max.bytes=104857600


############################# Log Basics #############################

# A comma separated list of directories under which to store log files
log.dirs=/opt/kafka/kafkalog_2

# The default number of log partitions per topic. More partitions allow greater
# parallelism for consumption, but this will also result in more files across
# the brokers.
num.partitions=1

# The number of threads per data directory to be used for log recovery at startup and flushing at shutdown.
# This value is recommended to be increased for installations with data dirs located in RAID array.
num.recovery.threads.per.data.dir=1

############################# Internal Topic Settings  #############################
# The replication factor for the group metadata internal topics "__consumer_offsets" and "__transaction_state"
# For anything other than development testing, a value greater than 1 is recommended for to ensure availability such as 3.
offsets.topic.replication.factor=1
transaction.state.log.replication.factor=1
transaction.state.log.min.isr=1

############################# Log Flush Policy #############################

# Messages are immediately written to the filesystem but by default we only fsync() to sync
# the OS cache lazily. The following configurations control the flush of data to disk.
# There are a few important trade-offs here:
#    1. Durability: Unflushed data may be lost if you are not using replication.
#    2. Latency: Very large flush intervals may lead to latency spikes when the flush does occur as there will be a lot of data to flush.
#    3. Throughput: The flush is generally the most expensive operation, and a small flush interval may lead to excessive seeks.
# The settings below allow one to configure the flush policy to flush data after a period of time or
# every N messages (or both). This can be done globally and overridden on a per-topic basis.

# The number of messages to accept before forcing a flush of data to disk
#log.flush.interval.messages=10000

# The maximum amount of time a message can sit in a log before we force a flush
#log.flush.interval.ms=1000

############################# Log Retention Policy #############################

# The following configurations control the disposal of log segments. The policy can
# be set to delete segments after a period of time, or after a given size has accumulated.
# A segment will be deleted whenever *either* of these criteria are met. Deletion always happens
# from the end of the log.

# The minimum age of a log file to be eligible for deletion due to age
log.retention.hours=168

# A size-based retention policy for logs. Segments are pruned from the log unless the remaining
# segments drop below log.retention.bytes. Functions independently of log.retention.hours.
#log.retention.bytes=1073741824

# The maximum size of a log segment file. When this size is reached a new log segment will be created.
log.segment.bytes=1073741824

# The interval at which log segments are checked to see if they can be deleted according
# to the retention policies
log.retention.check.interval.ms=300000

############################# Zookeeper #############################

# Zookeeper connection string (see zookeeper docs for details).
# This is a comma separated host:port pairs, each corresponding to a zk
# server. e.g. "127.0.0.1:3000,127.0.0.1:3001,127.0.0.1:3002".
# You can also append an optional chroot string to the urls to specify the
# root directory for all kafka znodes.
zookeeper.connect=pc1:2181,pc1:2182,pc1:2183

# Timeout in ms for connecting to zookeeper
zookeeper.connection.timeout.ms=6000


############################# Group Coordinator Settings #############################

# The following configuration specifies the time, in milliseconds, that the GroupCoordinator will delay the initial consumer rebalance.
# The rebalance will be further delayed by the value of group.initial.rebalance.delay.ms as new members join the group, up to a maximum of max.poll.interval.ms.
# The default value for this is 3 seconds.
# We override this to 0 here as it makes for a better out-of-the-box experience for development and testing.
# However, in production environments the default value of 3 seconds is more suitable as this will help to avoid unnecessary, and potentially expensive, rebalances during application startup.
group.initial.rebalance.delay.ms=0

我的配置：
server_3.properties

# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# see kafka.server.KafkaConfig for additional details and defaults

############################# Server Basics #############################

# The id of the broker. This must be set to a unique integer for each broker.
broker.id=3
port=9094
############################# Socket Server Settings #############################

# The address the socket server listens on. It will get the value returned from 
# java.net.InetAddress.getCanonicalHostName() if not configured.
#   FORMAT:
#     listeners = listener_name://host_name:port
#   EXAMPLE:
#     listeners = PLAINTEXT://your.host.name:9092
listeners=PLAINTEXT://pc1:9094

# Hostname and port the broker will advertise to producers and consumers. If not set, 
# it uses the value for "listeners" if configured.  Otherwise, it will use the value
# returned from java.net.InetAddress.getCanonicalHostName().
#advertised.listeners=PLAINTEXT://your.host.name:9092

# Maps listener names to security protocols, the default is for them to be the same. See the config documentation for more details
#listener.security.protocol.map=PLAINTEXT:PLAINTEXT,SSL:SSL,SASL_PLAINTEXT:SASL_PLAINTEXT,SASL_SSL:SASL_SSL

# The number of threads that the server uses for receiving requests from the network and sending responses to the network
num.network.threads=3

# The number of threads that the server uses for processing requests, which may include disk I/O
num.io.threads=8

# The send buffer (SO_SNDBUF) used by the socket server
socket.send.buffer.bytes=102400

# The receive buffer (SO_RCVBUF) used by the socket server
socket.receive.buffer.bytes=102400

# The maximum size of a request that the socket server will accept (protection against OOM)
socket.request.max.bytes=104857600


############################# Log Basics #############################

# A comma separated list of directories under which to store log files
log.dirs=/opt/kafka/kafkalog_3

# The default number of log partitions per topic. More partitions allow greater
# parallelism for consumption, but this will also result in more files across
# the brokers.
num.partitions=1

# The number of threads per data directory to be used for log recovery at startup and flushing at shutdown.
# This value is recommended to be increased for installations with data dirs located in RAID array.
num.recovery.threads.per.data.dir=1

############################# Internal Topic Settings  #############################
# The replication factor for the group metadata internal topics "__consumer_offsets" and "__transaction_state"
# For anything other than development testing, a value greater than 1 is recommended for to ensure availability such as 3.
offsets.topic.replication.factor=1
transaction.state.log.replication.factor=1
transaction.state.log.min.isr=1

############################# Log Flush Policy #############################

# Messages are immediately written to the filesystem but by default we only fsync() to sync
# the OS cache lazily. The following configurations control the flush of data to disk.
# There are a few important trade-offs here:
#    1. Durability: Unflushed data may be lost if you are not using replication.
#    2. Latency: Very large flush intervals may lead to latency spikes when the flush does occur as there will be a lot of data to flush.
#    3. Throughput: The flush is generally the most expensive operation, and a small flush interval may lead to excessive seeks.
# The settings below allow one to configure the flush policy to flush data after a period of time or
# every N messages (or both). This can be done globally and overridden on a per-topic basis.

# The number of messages to accept before forcing a flush of data to disk
#log.flush.interval.messages=10000

# The maximum amount of time a message can sit in a log before we force a flush
#log.flush.interval.ms=1000

############################# Log Retention Policy #############################

# The following configurations control the disposal of log segments. The policy can
# be set to delete segments after a period of time, or after a given size has accumulated.
# A segment will be deleted whenever *either* of these criteria are met. Deletion always happens
# from the end of the log.

# The minimum age of a log file to be eligible for deletion due to age
log.retention.hours=168

# A size-based retention policy for logs. Segments are pruned from the log unless the remaining
# segments drop below log.retention.bytes. Functions independently of log.retention.hours.
#log.retention.bytes=1073741824

# The maximum size of a log segment file. When this size is reached a new log segment will be created.
log.segment.bytes=1073741824

# The interval at which log segments are checked to see if they can be deleted according
# to the retention policies
log.retention.check.interval.ms=300000

############################# Zookeeper #############################

# Zookeeper connection string (see zookeeper docs for details).
# This is a comma separated host:port pairs, each corresponding to a zk
# server. e.g. "127.0.0.1:3000,127.0.0.1:3001,127.0.0.1:3002".
# You can also append an optional chroot string to the urls to specify the
# root directory for all kafka znodes.
zookeeper.connect=pc1:2181,pc1:2182,pc1:2183

# Timeout in ms for connecting to zookeeper
zookeeper.connection.timeout.ms=6000


############################# Group Coordinator Settings #############################

# The following configuration specifies the time, in milliseconds, that the GroupCoordinator will delay the initial consumer rebalance.
# The rebalance will be further delayed by the value of group.initial.rebalance.delay.ms as new members join the group, up to a maximum of max.poll.interval.ms.
# The default value for this is 3 seconds.
# We override this to 0 here as it makes for a better out-of-the-box experience for development and testing.
# However, in production environments the default value of 3 seconds is more suitable as this will help to avoid unnecessary, and potentially expensive, rebalances during application startup.
group.initial.rebalance.delay.ms=0

配置说明：

broker.id=1 #当前机器在集群中的唯一标识，和zookeeper的myid性质一样
port=9092 #当前kafka对外提供服务的端口默认是9092
host.name=192.168.1.100 #这个参数默认是关闭的，在0.8.1有个bug，DNS解析问题，失败率的问题。
num.network.threads=3 #这个是borker进行网络处理的线程数
num.io.threads=8 #这个是borker进行I/O处理的线程数
log.dirs=/opt/kafka/kafkalogs/ #消息存放的目录，这个目录可以配置为“，”逗号分割的表达式，上面的num.io.threads要大于这个目录的个数这个目录，如果配置多个目录，新创建的topic他把消息持久化的地方是，当前以逗号分割的目录中，那个分区数最少就放那一个
socket.send.buffer.bytes=102400 #发送缓冲区buffer大小，数据不是一下子就发送的，先回存储到缓冲区了到达一定的大小后在发送，能提高性能
socket.receive.buffer.bytes=102400 #kafka接收缓冲区大小，当数据到达一定大小后在序列化到磁盘
socket.request.max.bytes=104857600 #这个参数是向kafka请求消息或者向kafka发送消息的请请求的最大数，这个值不能超过java的堆栈大小
num.partitions=1 #默认的分区数，一个topic默认1个分区数
log.retention.hours=168 #默认消息的最大持久化时间，168小时，7天
message.max.byte=5242880  #消息保存的最大值5M
default.replication.factor=2  #kafka保存消息的副本数，如果一个副本失效了，另一个还可以继续提供服务
replica.fetch.max.bytes=5242880  #取消息的最大直接数
log.segment.bytes=1073741824 #这个参数是：因为kafka的消息是以追加的形式落地到文件，当超过这个值的时候，kafka会新起一个文件
log.retention.check.interval.ms=300000 #每隔300000毫秒去检查上面配置的log失效时间（log.retention.hours=168 ），到目录查看是否有过期的消息如果有，删除
log.cleaner.enable=false #是否启用log压缩，一般不用启用，启用的话可以提高性能
zookeeper.connect=192.168.1.100:2181,192.168.1.101:2181,192.168.1.102:2181 #设置zookeeper的连接端口

上面是参数的解释，实际的修改项重点关注：

#broker.id=1  每台服务器的broker.id都不能相同
#hostname
host.name=192.168.1.100
#在log.retention.hours=168 下面新增下面三项（一周时间）
message.max.byte=5242880
default.replication.factor=2
replica.fetch.max.bytes=5242880
#设置zookeeper的连接端口
zookeeper.connect=192.168.1.100:2181,192.168.1.101:2181,192.168.1.102:2181

启动Kafka集群

#后台启动Kafka集群（3台都需要启动）
cd /opt/kafka/kafka_2.11-1.1.1
#执行启动命令 指定配置文件位置启动
bin/kafka-server-start.sh -daemon config/server_1.properties #启动 kafka节点1
bin/kafka-server-start.sh -daemon config/server_2.properties #启动 kafka节点2
bin/kafka-server-start.sh -daemon config/server_3.properties #启动 kafka节点3

检查服务是否启动

#执行命令 jps 查看进程
20309 Jps
7464 QuorumPeerMain
7515 QuorumPeerMain 
7581 QuorumPeerMain
8017 Kafka
8398 Kafka
8780 Kafka

创建Topic来验证Kafka

bin/kafka-topics.sh --create --zookeeper pc1:2181 --replication-factor 3 --partitions 3 --topic testTopic

#解释
–replication-factor 3 #创建3个备份
–partitions 3 #创建3个分区
–topic #主题为 testTopic

创建一个生产者

#创建一个broker，producer（生产者）
bin/kafka-console-producer.sh --broker-list pc1:9092 --topic testTopic
#在窗口中 输入消息
123456 + 回车
654321 + 回车

注意：
生产者创建完成后在窗口中输入消息例如：123456 + 回车； 654321 + 回车；将消息插入 Kafka消息队列 (此时 testTopic中就有了两条消息，然后创建消费者)

创建一个消费者

注意：可以另开启一个ssh连接工具窗口来创建消费者，这样可以体验到生产者窗口中插入消息，在消费者窗口可以立即被消费。或者 ctrl + C 退出当前生产者

#创建一个 consumer （消费者）
bin/kafka-console-consumer.sh --zookeeper pc1:2181 --topic testTopic --from-beginning

消费者创建完成后可以看到窗口中由上至下的消费了两条消息
123456
654321

相关链接：
删除Kafka topic的一些方法

至此为止 Kafka集群搭建完毕。