The way to build kafka windows environment

Introduction: kafka is a distributed, partitionable, and replicable message system. It is often used to process log information. It is an open source project under Apache (I think Apache is simply too awesome...)

 

(1) Basic message terminology:

    Kafka summarizes messages by topic.

  Programs that publish messages to Kafka topics become producers.

  Programs that subscribe to topics and consume messages become consumers.

  Kafka runs in a cluster and can consist of one or more services, each of which is called a broker.

    Topics and Partitions: A topic is a generalization of a set of messages. For each topic, Kafka partitions its log. Each partition consists of a sequence of ordered, immutable messages that are appended to the partition in succession.

Each message in the partition has a continuous sequence number called offset, which is used to uniquely identify the message in the partition.

    Distributed: Each partition has replicas in several services in the Kafka cluster, so that these replica services can jointly process data and requests, and the number of replicas can be configured. Replicas make Kafka fault-tolerant.

Each partition has one server as "leader" and zero or several servers as "followers". The leader is responsible for reading and writing messages, and the followers replicate the leader. If the leader goes down, one of the followers will automatically become leader.

Each service in the cluster plays two roles at the same time: as a leader for some of the partitions it holds, and as followers for other partitions, so that the cluster will have better load balancing.

    Producers send messages to the Kafka cluster over the network, and the cluster provides messages to consumers. As shown in the figure:



 

    The role of zookeeper in kafak is to do soft load balancing.

The client and server communicate through the TCP protocol. Kafka provides Java clients and supports multiple languages.

 

(二)Producer & Consumer

The Producer publishes messages to the topic it specifies and is responsible for deciding which partition to publish to. Usually, the load balancing mechanism selects the partition randomly, but it can also select the partition through a specific partition function. The second one is used more often.

There are usually two modes for publishing messages: queue mode (queuing) and publish-subscribe mode (publish-subscribe). In the queue mode, consumers can read messages from the server at the same time, and each message is read by only one of the consumers; in the publish-subscribe mode, the message is broadcast to all consumers.

Consumers can join a consumer group to compete for a topic, and the messages in the topic will be distributed to a member of the group. Consumers in the same group can be in different programs or on different machines. If all consumers are in a group, this becomes a traditional queue mode, which implements load balancing among consumers.

If all consumers are not in different groups, this becomes a publish-subscribe model, and all messages are distributed to all consumers.

More commonly, each topic has a certain number of consumer groups, and each group is a logical "subscriber". For fault tolerance and better stability, each group consists of several consumers. This is actually a publish-subscribe model, except that the subscribers are a group rather than a single consumer.

 

(3) kafak windows environment construction

Well, it took a lot of work to build kafka windows. There are not many articles on the configuration of kafak windows on the Internet. Even if you refer to the articles on the Internet, it has failed many times. . Later, with the help of Brother F, it was finally done, and I would like to express my heartfelt thanks to him again.

 

step1: Go to the download page of Kafka's official website http://kafka.apache.org/downloads.html to download the kafak compressed package (note that the downloaded kafak version is not kafka-xx-src.tgz), the version I downloaded here is

kafka_2.9.2-0.8.1.tgz, unzip after downloading.

 

step2: After downloading, you need to check several configuration files in the config directory:

1) producer.proerties: metadata.broker.list If you configure a cluster, you need to add multiple broker nodes here, and each node is separated by . For example: localhost:9092, ip2:9093, ip3:9092 (the broker node is in

on different machines) or ip:9092, ip:9093, ip:9094 (broker nodes are on different ports on the same machine)

 

2) server.properties: log.dirs specifies the directory where the logs are stored after the kafka server is started. Generally after downloading, log.dirs=/tmp/kafka-logs, if you do not modify it, it is easy to start kafak-server-start.bat

Reported wrong about Log4j. It is recommended to create a tmp directory in the root directory after kafka is decompressed. There can be two directories below, kafka-logs and zookeeper, to store the logs of kafka and zookeeper respectively;

Check if the path of zookeeper.connect is a local path

 

3) In the same way (2), check the dataDir of zookeeper.properties (default is /tmp/zookeeper after decompression), which can be modified to the zookeeper directory under tmp in 2)

 

Step3: Modify the kafak-run-class.bat file under bin/windows, this file is also the most likely place to cause kafak startup errors!

Modify the path corresponding to set ivyPath to point to the path of your decompressed libs, such as E:\kafka_2.9.2-0.8.1\libs, otherwise it is super easy to report an error!

Check the code of the following set calls in turn, as follows:

 

 

set snappy=%ivyPath%\snappy-java-1.0.5.jar
call :concat %snappy%

 Because maybe he set xxx = %ivyPath%\yyy.jar , but the jar package in the libs directory you downloaded may not have yyy.jar, if so, put the jar package that does not exist under your libs, but the .bat The code of the jar package set in the file can be deleted!

 

keep checking

 

IF ["%KAFKA_OPTS%"] EQU [""] (
	set KAFKA_OPTS=-Xmx512M -server -Dlog4j.configuration=file:"%BASE_DIR%\config\log4j.properties"
)

 To see if the path specified by KAFKA_OPTS is correct, it is recommended to manually modify it to the path of log4j.properties under config

 

 

step4: Check the config/log4j.properties file, kafka.logs.dir= to see if the specified path exists locally, it is recommended to change it to your local log file directory.

 

step5: After several major configuration files are modified. Copy the server.properties and zookeeper.properties under config to bin/windows/. Then create two .bat files under bin/windows for

Start zookeeper and kafka. The content is as follows:

zookeeper-start.bat:

 

zookeeper-server-start.bat zookeeper.properties

 kafka-start.bat

 

 

kafak-server-start.bat server.properties

 

 

Above, the kafka environment under the windows operating system is ready. To start kafka, run the zookeeper-start.bat file first, and then run the kafka-start.bat file.

 

Let's talk about the exceptions that occurred in the windwos running environment of kafak:

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/zookeeper/

server/quorum/QuorumPeerMain

Caused by: java.lang.ClassNotFoundException: org.apache.zookeeper.server.quorum.

QuorumPeerMain

        at java.net.URLClassLoader$1.run(URLClassLoader.java:202)

        at java.security.AccessController.doPrivileged(Native Method)

        at java.net.URLClassLoader.findClass(URLClassLoader.java:190)

        at java.lang.ClassLoader.loadClass(ClassLoader.java:306)

        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)

        at java.lang.ClassLoader.loadClass(ClassLoader.java:247)

Could not find the main class: org.apache.zookeeper.server.quorum.QuorumPeerMain

.  Program will exit.

This is one of the errors caused by not modifying any configuration files and .bat files after you get a kafak compressed package downloaded from the official website and decompress it.

 

(4) Points to note:

1) The reason why kafka is open source means that after you download a version, if an error is reported at startup, you must open the corresponding .bat file to see where the error is reported. It is most likely that some configurations have not been modified, and it is not that everything will be fine after downloading;

2). The bat file is a shell script similar to the Linux operating system in the windows environment, and is an executable file. If you want to see the variables defined in it, you must learn the echo (echo) % variable name% command and the use of the pause (break point) command!

3) Zookeeper must be started before starting kafka, but this does not mean that you have to start a zookeeper separately, because looking at zookeeper.bat you will find:

kafka-run-class.bat org.apache.zookeeper.server.quorum.QuorumPeerMain %*

And look at kafka-run-class.bat again and you will find:

 

set zookeeper=%ivyPath%\zookeeper-3.3.4.jar
	call :concat %zookeeper%

 This shows that Kafka's support for zookeeper has a zookeeper-xx.jar package in the libs directory.

 

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326863688&siteId=291194637