Kafka官方教程（Kafka 1.1 Documentation）

Kafka

Quickstart

翻译自 Kafka 1.1 Documentation
本教程假设你初次使用并且之前没有Kafka和Zookeeper数据。由于Kafka控制台脚本对于基于Unix平台和windows平台是不同的，在windows平台上要用的 bin\windows* 替代 *bin/, 并且改变扩展名为 .bat.

第1步: 下载源码

下载1.1.0版本并解压。

> tar -xzf kafka_2.11-1.1.0.tgz
> cd kafka_2.11-1.1.0

第2步：开启服务

Kafka使用ZooKeeper，因此如果还没有ZooKeeper服务，需要首先启动ZooKeeper服务。使用使用与kafka打包在一起的便捷脚本来获得快速且简单的单节点ZooKeeper实例。

> bin/zookeeper-server-start.sh config/zookeeper.properties

现在启动Kafka服务：

> bin/kafka-server-start.sh config/server.properties

第3步：创建一个主题

我们创建一个带有一个分区和一个副本的名为“test”的主题：

> bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test

如果运行list topic命令，现在可以看到该主题：

> bin/kafka-topics.sh --list --zookeeper localhost:2181
test

或者，也可以将代理brokers配置为在发布不存在的主题时自动创建主题，而不是手动创建主题。

第4步：发送一些消息

Kafka带有一个命令行客户端，它将从文件或标准输入中获取输入，并将其作为消息发送到Kafka集群。默认情况下，每行将作为单独的消息发送。

运行生产者，然后在控制台中输入几条消息发送到服务端。

> bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test
This is a message
This is another message

第5步：启动消费者

Kafka也有一个命令行消费者，将消息转储到标准输出。

> bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test --from-beginning
This is a message
This is another message

如果将上述每个命令都在不同的终端中运行，那么现在应该能够将消息输入生产者终端，并将它们显示在消费者终端中。
所有的命令行工具都有其他选项; 在没有参数的情况下运行该命令将显示更详细地记录它们的使用信息。

第6步：设置多代理群集

到目前为止，我们运行在单个的broker，但这并不好玩。对于Kafka来说，一个broker是一个数量为1的集群。现在将集群扩展为三个节点（全部仍在我们的本地机器上）。

首先，我们为每个代理broker创建一个配置文件（在Windows上使用copy命令）：

> cp config/server.properties config/server-1.properties
> cp config/server.properties config/server-2.properties

编辑这些新文件并设置下列属性：

config/server-1.properties:
    broker.id=1
    listeners=PLAINTEXT://:9093
    log.dir=/tmp/kafka-logs-1

config/server-2.properties:
    broker.id=2
    listeners=PLAINTEXT://:9094
    log.dir=/tmp/kafka-logs-2

broker.id属性是集群中每个节点的唯一且永久的名称。必须更改端口和日志目录，因为是在同一台机器上使用这些端口和日志目录，并且我们希望让所有代理都试图在同一个端口上注册或覆盖彼此的数据。

我们已经有Zookeeper和单节点了，所以只需要启动两个新节点：

> bin/kafka-server-start.sh config/server-1.properties &
...
> bin/kafka-server-start.sh config/server-2.properties &
...

现在创建一个副本为3的新主题：

> bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 3 --partitions 1 --topic my-replicated-topic

现在有一个集群，怎么知道哪个broker在做什么？运行”describe topics” 命令查看：

> bin/kafka-topics.sh --describe --zookeeper localhost:2181 --topic my-replicated-topic
Topic:my-replicated-topic   PartitionCount:1    ReplicationFactor:3 Configs:
    Topic: my-replicated-topic  Partition: 0    Leader: 1   Replicas: 1,2,0 Isr: 1,2,0

这里是对输出的解释。第一行给出了所有分区的摘要，每个附加行提供了有关一个分区的信息。由于我们只有一个分区，所以只有一行。

“leader”是负责给定分区的所有读写操作的节点。每个节点将成为分区随机选择部分的leader。
“replicas副本”是复制此分区的日志的节点列表，无论他们是leader还是他们现在都存活。
“isr”是一组“同步”副本。这是副本列表的子集，目前活着并被引导到leader。

请注意，在我的示例中，节点1是该主题唯一分区的领导者。我们可以在我们创建的原始主题上运行相同的命令，以查看它的位置：

> bin/kafka-topics.sh --describe --zookeeper localhost:2181 --topic test
Topic:test  PartitionCount:1    ReplicationFactor:1 Configs:
    Topic: test Partition: 0    Leader: 0   Replicas: 0 Isr: 0

所以在这里并不奇怪 - 原始主题没有副本，并且在server0上，它是我们创建群集时唯一的server。

让我们发布一些消息给我们的新主题：

> bin/kafka-console-producer.sh --broker-list localhost:9092 --topic my-replicated-topic
...
my test message 1
my test message 2
^C

现在我们消费这些消息：

> bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --from-beginning --topic my-replicated-topic
...
my test message 1
my test message 2
^C

现在我们来测试容错。原来 broker1充当leader，所以杀掉它：

> ps aux | grep server-1.properties
7564 ttys002    0:15.91 /System/Library/Frameworks/JavaVM.framework/Versions/1.8/Home/bin/java...
> kill -9 7564

Windows :

> wmic process where "caption = 'java.exe' and commandline like '%server-1.properties%'" get processid
ProcessId
6016
> taskkill /pid 6016 /f

leader已切换到其中一个从属节点，并且节点1不再处于同步副本集中：

> bin/kafka-topics.sh --describe --zookeeper localhost:2181 --topic my-replicated-topic
Topic:my-replicated-topic   PartitionCount:1    ReplicationFactor:3 Configs:
    Topic: my-replicated-topic  Partition: 0    Leader: 2   Replicas: 1,2,0 Isr: 2,0

但是即使原先的leader关闭，这些消息仍然可用于消费：

> bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --from-beginning --topic my-replicated-topic
...
my test message 1
my test message 2
^C

第7步：使用Kafka Connect导入/导出数据

从控制台写入数据并将其写回控制台是一个方便的起点，但您可能需要使用其他来源的数据或将数据从Kafka导出到其他系统。对于许多系统，您可以使用Kafka Connect导入或导出数据，而不是编写自定义集成代码。

Kafka Connect是Kafka附带的一个工具，可以将数据导入和导出到Kafka。它是一个可扩展的工具，运行连接器，实现与外部系统交互的自定义逻辑。在此教程中，我们将看到如何使用简单的连接器运行Kafka Connect，这些连接器将数据从文件导入到Kafka主题，并将数据从Kafka主题导出到文件。

首先，我们将通过创建一些种子数据开始测试：

> echo -e "foo\nbar" > test.txt

Windows:

> echo foo> test.txt
> echo bar>> test.txt

接下来，我们将启动两个以独立模式运行的连接器 connector，这意味着它们将在单个本地专用进程中运行。我们提供三个配置文件作为参数。首先是Kafka Connect过程的配置，包含常见的配置，例如要连接的Kafka代理和数据的序列化格式。其余的配置文件都指定了要创建的连接器。这些文件包括唯一的连接器名称，要实例化的连接器类以及连接器所需的任何其他配置。

> bin/connect-standalone.sh config/connect-standalone.properties config/connect-file-source.properties config/connect-file-sink.properties

These sample configuration files, included with Kafka, use the default local cluster configuration you started earlier and create two connectors: the first is a source connector that reads lines from an input file and produces each to a Kafka topic and the second is a sink connector that reads essages from a Kafka topic and produces each as a line in an output file.
During startup you’ll see a number of log messages, including some indicating that the connectors are being instantiated. Once the Kafka Connect process has started, the source connector should start reading lines from test.txt and producing them to the topic connect-test, and the sink connector should start reading messages from the topic connect-test and write them to the file test.sink.txt. We can verify the data has been delivered through the entire pipeline by examining the contents of the output file:
Kafka附带的这些示例配置文件使用您之前启动的默认本地群集配置，并创建两个连接器：第一个源连接器从输入文件中读取行，并将每个连接生成为Kafka主题，第二个连接器为连接器连接器它读取来自Kafka主题的消息，并在输出文件中将每个消息生成为一行。
在启动过程中，您会看到许多日志消息，包括一些指示连接器正在实例化的消息。一旦Kafka Connect进程启动，源连接器应该从test.txt开始读取行并将其生成到主题connect-test，并且接收器连接器应该开始读取主题connect-test中的消息并将它们写入文件测试.sink.txt。我们可以通过检查输出文件的内容来验证通过整个管道传输的数据：

> more test.sink.txt
foo
bar

Note that the data is being stored in the Kafka topic connect-test, so we can also run a console consumer to see the data in the topic (or use custom consumer code to process it):
请注意，数据存储在Kafka主题连接测试中，因此我们还可以运行控制台使用者以查看主题中的数据（或使用自定义使用者代码来处理它）：

> bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic connect-test --from-beginning
{"schema":{"type":"string","optional":false},"payload":"foo"}
{"schema":{"type":"string","optional":false},"payload":"bar"}
...

The connectors continue to process data, so we can add data to the file and see it move through the pipeline:
连接器继续处理数据，所以我们可以将数据添加到文件中，并看到它在整个管道中移动：

> echo Another line>> test.txt

You should see the line appear in the console consumer output and in the sink file.
Step 8: Use Kafka Streams to process data

Kafka Streams is a client library for building mission-critical real-time applications and microservices, where the input and/or output data is stored in Kafka clusters. Kafka Streams combines the simplicity of writing and deploying standard Java and Scala applications on the client side with the benefits of Kafka’s server-side cluster technology to make these applications highly scalable, elastic, fault-tolerant, distributed, and much more. This quickstart example will demonstrate how to run a streaming application coded in this library.

Kafka Streams是一个用于构建关键任务实时应用程序和微服务的客户端库，输入和/或输出数据存储在Kafka集群中。 Kafka Streams结合了在客户端编写和部署标准Java和Scala应用程序的简单性以及Kafka服务器端集群技术的优势，使这些应用程序具有高度可伸缩性，弹性，容错性，分布式等特性。本快速入门示例将演示如何运行在此库中编码的流式应用程序。