kafka installation and startup

The background knowledge of kafka has been covered a lot, let's start practicing now, assuming you don't have Kafkaan ZooKeeperenvironment right now.

Step 1: Download the code

Download version 0.10.0.0 and unzip it.

> tar -xzf kafka_2.11-0.10.0.0.tgz 
> cd kafka_2.11-0.10.0.0

Step 2: Start the service

Running kafka requires Zookeeper, so you need to start Zookeeper first. If you don't have Zookeeper, you can use Kafka's own packaged and configured Zookeeper.

> bin/zookeeper-server-start.sh config/zookeeper.properties
[2013-04-22 15:01:37,495] INFO Reading configuration from: config/zookeeper.properties (org.apache.zookeeper.server.quorum.QuorumPeerConfig)
...

Now start the kafka service

> bin/kafka-server-start.sh config/server.properties &
[2013-04-22 15:01:47,028] INFO Verifying properties (kafka.utils.VerifiableProperties)
[2013-04-22 15:01:47,051] INFO Property socket.send.buffer.bytes is overridden to 1048576 (kafka.utils.VerifiableProperties)
...

Step 3: Create a topic

Create a topic named "test" with only one partition and one backup:

> bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test

After creation, you can view the created topic information by running the following command:

> bin/kafka-topics.sh --list --zookeeper localhost:2181
test

Alternatively, instead of manually creating topics, you can configure your broker to automatically create topics when publishing a non-existing topic.

Step 4: Send a message

Kafka provides a command line tool that can read messages from input files or the command line and send them to the Kafka cluster. Each line is a message.
Run the producer (producer), then enter a few messages in the console to the server.

> bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test 
This is a message
This is another message

Step 5: Consume the message

Kafka also provides a command-line tool for consuming messages and outputting stored information.

> bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic test --from-beginning
This is a message
This is another message

If you run the above command on 2 different terminals, then when you are running the producer, the consumer will be able to consume the messages sent by the producer.
All command line tools have a lot of options, you can check the documentation for more functions.

Step 6: Set up multiple broker clusters

So far, we just run a single broker, which is not interesting. For Kafka, a broker is just the size of a cluster, so let's set up a few more brokers.
First create a configuration file for each broker:

> cp config/server.properties config/server-1.properties 
> cp config/server.properties config/server-2.properties

Now edit these newly created files and set the following properties:

config/server-1.properties: 
    broker.id=1 
    listeners=PLAINTEXT://:9093 
    log.dir=/tmp/kafka-logs-1

config/server-2.properties: 
    broker.id=2 
    listeners=PLAINTEXT://:9094 
    log.dir=/tmp/kafka-logs-2

broker.idis a unique and permanent name for each node in the cluster. We modified the port and log partition because we are now running on the same machine, and we want to prevent brokers from registering on the same port and overwriting each other's data.

We already have zookeeper and a kafka node running, so we just need to start 2 new kafka nodes.

> bin/kafka-server-start.sh config/server-1.properties &
... 
> bin/kafka-server-start.sh config/server-2.properties &
...

Now, we create a new topic and set the backup to: 3

> bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 3 --partitions 1 --topic my-replicated-topic

Well, now that we have a cluster, how do we know what each cluster is doing? Run the command "describe topics"

> bin/kafka-topics.sh --describe --zookeeper localhost:2181 --topic my-replicated-topic
Topic:my-replicated-topic    PartitionCount:1    ReplicationFactor:3    Configs:
Topic: my-replicated-topic    Partition: 0    Leader: 1    Replicas: 1,2,0    Isr: 1,2,0

This is an explain output, the first line is a summary of all partitions, each line provides one partition information, since we only have one partition, there is only one line.

  • "leader": This node is responsible for reading and writing all specified partitions, and the leader of each node is randomly selected.
  • "replicas": Backup nodes, regardless of whether the node is the leader or currently alive, just displayed.
  • "isr": The set of backup nodes, that is, the set of live nodes.

Let's run this command to see the node we created at the beginning:

> bin/kafka-topics.sh --describe --zookeeper localhost:2181 --topic test
Topic:test    PartitionCount:1    ReplicationFactor:1    Configs:
Topic: test    Partition: 0    Leader: 0    Replicas: 0    Isr: 0

No surprises, the topic (topic) just created has no Replicas, so it is 0.

Let's post some information on the new topic:

> bin/kafka-console-producer.sh --broker-list localhost:9092 --topic my-replicated-topic
 ...
my test message 1
my test message 2
^C

Now, consume these messages.

> bin/kafka-console-consumer.sh --zookeeper localhost:2181 --from-beginning --topic my-replicated-topic
 ...
my test message 1
my test message 2
^C

We want to test the fault tolerance of the cluster, kill the leader, and Broker1 as the current leader, that is, kill Broker1.

> ps | grep server-1.properties
7564 ttys002    0:15.91 /System/Library/Frameworks/JavaVM.framework/Versions/1.6/Home/bin/java... 
> kill -9 7564

One of the backup nodes becomes the new leader, and broker1 is no longer in the sync backup set.

> bin/kafka-topics.sh --describe --zookeeper localhost:2181 --topic my-replicated-topic
Topic:my-replicated-topic    PartitionCount:1    ReplicationFactor:3    Configs:
Topic: my-replicated-topic    Partition: 0    Leader: 2    Replicas: 1,2,0    Isr: 2,0

However, the message is still not lost:

> bin/kafka-console-consumer.sh --zookeeper localhost:2181 --from-beginning --topic my-replicated-topic
...
my test message 1
my test message 2
^C

Step 7: Use Kafka Connect to import/export data

Writing and writing data back from the console is a convenient start, but you may want to import or export data from other sources to other systems. For most systems, kafka Connect can be used without writing custom integration code. Kafka Connect is a tool for importing and exporting data. It is an extensible tool that runs connectors that implement custom logic to interact with external systems. In this quickstart, we'll see how to run Kafka Connect to import data from a file to a Kafka topic and export data from a Kafka topic to a file using a simple connector. First, let's create some seed data for testing:

echo -e "foo\nbar" > test.txt

Next, we start the 2 connectors running in standalone mode, which means they run in a single, local, dedicated process. We provide 3 configuration files as parameters. The first is always the kafka Connect process, such as the kafka broker connection and database serialization format, the remaining configuration files are created for each specified connector, these files include a unique connector name, connector class to instantiate and Any other configuration required.

> bin/connect-standalone.sh config/connect-standalone.properties config/connect-file-source.properties config/connect-file-sink.properties

Here is the configuration file for the example, using the default local cluster configuration and creating 2 connectors: the first is an import connector, which reads from the import file and publishes to a Kafka topic, and the second is an export connector, from The kafka topic reads the message output to an external file, and during the startup process, you will see some log messages, including some descriptions of the connector instantiation. Once the kafka Connect process has started, the import connector should read from

test.txt

and write to topic

connect-test

, export the connector from the theme

connect-test

read message write to file

test.sink.txt

. We can verify that the data data has all been exported by verifying the contents of the output file:

cat test.sink.txt
 foo
 bar

Note that the imported data is also already in the Kafka topic

connect-test

, so we can view this topic with this command:

bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic connect-test --from-beginning
 {"schema":{"type":"string","optional":false},"payload":"foo"}
{"schema":{"type":"string","optional":false},"payload":"bar"}
...

The connector continues to process the data, so we can add data to the file and move it through the pipe:

echo "Another line" >> test.txt

You should see a line of output appear in the consumer console and export to a file.

Step 8: Use Kafka Stream to process data

Kafka Stream is Kafka's client library for real-time stream processing and analysis of data stored in Kafka brokers. This quickstart example will demonstrate how to run a streaming application. An example of WordCountDemo (using java8 lambda expressions for readability)

KTable wordCounts = textLines
    // Split each text line, by whitespace, into words.
    .flatMapValues(value -> Arrays.asList(value.toLowerCase().split("W+")))

    // Ensure the words are available as record keys for the next aggregate operation.
    .map((key, value) -> new KeyValue<>(value, value))

    // Count the occurrences of each word (record key) and store the results into a table named "Counts".
    .countByKey("Counts")

It implements the wordcount algorithm, which counts the number of occurrences of a word from the input text. However, unlike other WordCount examples, you may see that the demo application that executes before limited data behaves slightly differently, since it is designed to operate on an infinite stream of data. Similar to a bounded variable, it is a dynamic algorithm that tracks and updates word counts. However, since it has to assume potentially unbounded input data, it periodically outputs its current state and results while continuing to process more data, since it doesn't know when it has processed "all" of the input data.

Now prepare to input data into the topic of kafka, and then the kafka Stream application will process the data of this topic.

> echo -e "all streams lead to kafka\nhello kafka streams\njoin kafka summit" > file-input.txt

Next, use the console's producer to send the input data to the specified topic (streams-file-input), (in practice, the stream data may continue to flow in, where the kafka application will be up and running)

> bin/kafka-topics.sh --create \
            --zookeeper localhost:2181 \
            --replication-factor 1 \
            --partitions 1 \
            --topic streams-file-input
> cat /tmp/file-input.txt | ./bin/kafka-console-producer --broker-list localhost:9092 --topic streams-file-input

Now, we run WordCount to process the input data:

> ./bin/kafka-run-class org.apache.kafka.streams.examples.wordcount.WordCountDemo

There won't be any STDOUT output, except for logging, the results are continuously written back to another topic (streams-wordcount-output), the demo runs for a few seconds, and then, unlike typical stream processing applications, terminates automatically.

Now we check the WordCountDemo application and read from the output topic.

> ./bin/kafka-console-consumer --zookeeper localhost:2181 
            --topic streams-wordcount-output 
            --from-beginning 
            --formatter kafka.tools.DefaultMessageFormatter 
            --property print.key=true 
            --property print.key=true 
            --property key.deserializer=org.apache.kafka.common.serialization.StringDeserializer 
            --property value.deserializer=org.apache.kafka.common.serialization.LongDeserializer

Print the output data to the console (you can use Ctrl-C to stop):

all     1
streams 1
lead    1
to      1
kafka   1
hello   1
kafka   2
streams 2
join    1
kafka   3
summit  1
^C

The first column is the key of the message, and the second column is the value of the message. It should be noted that the output is actually a continuous update stream, where each piece of data (ie: each row of the original output) is the latest count of a word, Also known as the record key "kafka". There are multiple records for the same key, and each record is followed by an update of the previous one.



Author: Orc
Link : http://orchome.com/6
Source: OrcHome
Copyright belongs to the author. For commercial reprints, please contact the author for authorization, and for non-commercial reprints, please indicate the source.
 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326146435&siteId=291194637