1. Experimental purpose
1. Understand the functions of each component of Kafka
2. Master the installation and deployment of Kafka
3. Understand the use of Kafka Java API
4. Understand the Producer API and Consumer API in Kafka Java API
2. Experimental requirements
1. Kafka installation and deployment. Kafka installation depends on Scala and ZooKeeper, so you need to install them first
Scala and ZooKeeper. Then based on the environment where Scala and ZooKeeper have been installed, install
Department Kafka.
2. Use Kafka Java API. Use a simple Java API to simulate Kafka's producer and
consumer, where producer generates content through a while loop and then passes the content to
Kafka, the consumer reads content from Kafka and outputs it in the Console interface.
3. (Optional) Install and deploy the Kafka environment on your computer.
3. Experimental process and results records
(1) Kafka installation and deployment
1. First, create a new /data/kafka1 directory locally on Linux to store the files required for the experiment.
1. mkdir -p /data/kafka1
Switch the directory to /data/kafka1 and use the wget command to download the required installation package scala-
2.10.4.tgz, kafka_2.10-0.8.2.2.tgz and zookeeper-3.4.5-
cdh5.4.5.tar.gz。
1. cd /data/kafka1
2. wget http://172.16.103.12:60000/allfiles/kafka1/scala-
2.10.4.tgz
3. wget http://172.16.103.12:60000/allfiles/kafka1/kafka_2.10-
0.8.2.2.tgz
4. wget http://172.16.103.12:60000/allfiles/kafka1/zookeeper-
3.4.5-cdh5.4.5.tar.gz
2. Install Scala.
Switch to the /data/kafka1 directory and extract the Scala installation package scala-2.10.4.tgz to /apps
directory, and rename the decompressed directory to scala.
1. cd /data/kafka1
2. tar -xzvf /data/kafka1/scala-2.10.4.tgz -C /apps/
3. cd /apps
4. mv /apps/scala-2.10.4/ /apps/scala
Use vim to open user environment variables.
1. sudo vim ~/.bashrc
Append the following Scala path information to the user environment variable. 1. #scala
2. export SCALA_HOME=/apps/scala
3. export PATH=$SCALA_HOME/bin:$PATH
Execute the source command to make the environment variables take effect.
1. source ~/.bashrc
3. Switch to the /data/kafka1 directory and copy kafka’s compressed package kafka_2.10-0.8.2.2.tgz
Unzip it to the /apps directory, and rename the unzipped directory to kafka.
1. cd /data/kafka1
2. tar -xzvf /data/kafka1/kafka_2.10-0.8.2.2.tgz -C /apps/
3. cd /apps
4. mv /apps/kafka_2.10-0.8.2.2/ /apps/kafka
Use vim to open user environment variables.
1. sudo vim ~/.bashrc
Append the following Kafka path information to the user environment variable.
1. #kafka
2. export KAFKA_HOME=/apps/kafka
3. export PATH=$KAFKA_HOME/bin:$PATH
Execute the source command to make the environment variables take effect.
1. source ~/.bashrc
4. Since some data of Kafka needs to be stored in ZooKeeper, additional installation is required.
ZooKeeper, or use the ZooKeeper program that comes with the Kafka installation package.
First, let's demonstrate the use of an external ZooKeeper program.
Unzip zookeeper-3.4.5-cdh5.4.5.tar.gz in the /data/kafka1 directory to /apps
directory, and rename the decompressed directory to zookeeper.
1. cd /data/kafka1
2. tar -xzvf /data/kafka1/zookeeper-3.4.5-cdh5.4.5.tar.gz -C
/apps/ 3. cd /apps
4. mv /apps/zookeeper-3.4.5-cdh5.4.5/ /apps/zookeeper
Use vim to open user environment variables.
1. sudo vim ~/.bashrc
Append the following Zookeeper path information to the user environment variable.
1. #zookeeper
2. export ZOOKEEPER_HOME=/apps/zookeeper
3. export PATH=$ZOOKEEPER_HOME/bin:$PATH
Execute the source command to make the environment variables take effect.
1. source ~/.bashrc
Modify the configuration file of ZooKeeper and configure ZooKeeper in stand-alone mode.
Switch to the directory /apps/zookeeper/conf where the ZooKeeper configuration file is located, and change
zoo_sample.cfg renamed to zoo.cfg
1. cd /apps/zookeeper/conf/
2. mv /apps/zookeeper/conf/zoo_sample.cfg
/apps/zookeeper/conf/zoo.cfg
Use vim to open the zoo.cfg file and modify the dataDir item content
1. vim zoo.cfg
Depend on:
1. dataDir=/tmp/zookeeper
Change to:
1. dataDir=/data/tmp/zookeeper-outkafka/data
The /data/tmp/zookeeper-outkafka/data directory here needs to be created in advance.
1. mkdir -p /data/tmp/zookeeper-outkafka/data starts ZooKeeper and checks the running status of ZooKeeper.
1. cd /apps/zookeeper/bin
2. ./zkServer.sh start
3. ./zkServer.sh status
Shut down ZooKeeper.
1. cd /apps/zookeeper/bin
2. ./zkServer.sh stop
5. Use Kafka's built-in ZooKeeper and change the directory to the /apps/kafka/config directory.
1. cd /apps/kafka/config
A configuration file with similar functions to ZooKeeper's configuration file zoo.cfg is placed here.
zookeeper.properties, use vim to open the zookeeper.properties configuration file.
1. vim zookeeper.properties
Change the dataDir directory to the /data/tmp/zookeeper-inkafka/data directory.
1. dataDir=/data/tmp/zookeeper-inkafka/data
The /data/tmp/zookeeper-inkafka/data directory here must be created in advance.
1. mkdir -p /data/tmp/zookeeper-inkafka/data
Next, start the ZooKeeper service, switch the directory to the /apps/kafka directory, and go to the kafka bin
The ZooKeeper startup script is placed in the directory. Press Ctrl+c to exit.
1. cd /apps/kafka
2. bin/zookeeper-server-start.sh config/zookeeper.properties &
The ampersand at the end will put zookeeper-server-start.sh in the background for execution. Enter jps
1. jps
View the ZooKeeper process QuorumPeerMain 1. zhangyu@8461bfd6a537:/apps/kafka$ jps
2. 375 Jps
3. 293 QuorumPeerMain
4. zhangyu@8461bfd6a537:/apps/kafka$
Next close the ZooKeeper process
1. cd /apps/kafka
2. bin/zookeeper-server-stop.sh stop
6. You can choose the above two ways of using ZooKeeper according to your needs. follow-up courses,
We will use external ZooKeeper by default to manage Kafka data.
At this point Kafka has been installed.
Next, test Kafka to see if it can run normally.
7. Switch to the /apps/zookeeper directory and start the ZooKeeper service.
1. cd /apps/zookeeper
2. bin/zkServer.sh start
8. Switch to the /apps/kafka/config directory, where Kafka-related configuration files are placed.
Use vim to open the configuration file server.properties of the Kafka service.
1. cd /apps/kafka/config
2. vim server.properties
The configuration items in the server.properties file include: basic server configuration, socket service settings
Configuration, log configuration, log refresh strategy, log retention strategy, ZooKeeper configuration.
The basic configuration of the server mainly includes the number of the current node.
The ZooKeeper configuration includes the IP and port number of the ZooKeeper service.
We modify the value of the zookeeper.connect item to:
1. zookeeper.connect=localhost:2181
The IP and port here are the ports used by ZooKeeper to send and receive messages. IP must be
The IP of the ZooKeeper service, we set it to localhost, and the port must be the same as
The clientPort port in zoo.cfg under /apps/zookeeper/conf is consistent. 9. Change the directory to the /apps/kafka directory and start the Kafka service. When starting the Kafka service, it will
Read the server.properties file in the Kafka configuration file directory.
1. cd /apps/kafka
2. bin/kafka-server-start.sh config/server.properties &
This starts the Kafka server and runs it on the backend.
10. Open another window and call the kafka-topic.sh script in the /apps/kafka/bin directory to create
Create a topic.
1. cd /apps/kafka
2. bin/kafka-topics.sh \
3. --create \
4. --zookeeper localhost:2181 \
5. --replication-factor 1 \
6. --topic sayaword \
7. --partitions 1
After the kafka-topic.sh command, you need to add some parameters, such as ZooKeeper configuration and topic name.
Weigh and so on.
Let’s check what topics are available in Kafka
1. bin/kafka-topics.sh --list --zookeeper localhost:2181
11. Call kafka-console-producer.sh in the /apps/kafka/bin directory to produce some messages
Information, producer is the producer.
1. bin/kafka-console-producer.sh --broker-list localhost:9092 --
topic sayaword
The localhost here is the IP of Kafka, and 9092 is the port of the broker node. Users can
On the console interface, input information is handed over to the producer for processing and sent to the consumer.
12. Open a window again, call kafka-console-consumer.sh in the bin directory, and start
consumer, consumer serves as a consumer and is used to consume data.
1. cd /apps/kafka 2. bin/kafka-console-consumer.sh --zookeeper localhost:2181 --
topic sayaword --from-beginning
kafka-console-consumer.sh still needs to add some parameters, such as ZooKeeper’s IP and terminal
port, topic name, read data location, etc.
13. In the interface for executing the kafka-console-producer.sh command, enter a few lines of text and press
Enter. You can see that on the consumer side, the same content will be output.
Producer side:
consumer 端:
14. Exit the test.
在 kafka-console-consumer.sh、kafka-console-producer.sh 及 kafka
server-start.sh In the command line interface, execute Ctrl + c to exit the consumer respectively.
producer 及 server。
Change the directory to the /apps/zookeeper/bin directory and stop ZooKeeper.
1. cd /apps/zookeeper/bin
2. ./zkServer.sh stop
At this point, the installation and testing of Kafka are complete!
(二)Kafka Java API
1. Create the /kafka3 folder in the /data directory.
1. mkdir /data/kafka3
2. Switch to the /data/kafka3 directory in Linux and use the wget command from
Download from http://172.16.103.12:60000/allfiles/kafka3/kafkalib.tar.gz
Text file kafkalib.tar.gz.
1. cd /data/kafka3
2. wget http: //172.16.103.12:60000/allfiles/kafka3/kafkalib.tar.gz 3. After the download is complete, decompress the compressed package to the current directory.
1. tar zxvf kafkalib.tar.gz
4. Open Eclipse and create a new Java project named kafka3. 5. Right-click the project name and create a new package named my.kafka. 6. Add the jar package that the project depends on.
Right-click the project and create a new folder named kafkalib to store the jar packages required by the project. Copy all jar packages in the kafkalib folder in the /data/kafka3 directory to Eclipse
In the kafkalib folder under the kafka3 project. Select all jar packages in the kafkalib folder and add them to the Build Path.
7. Start ZooKeeper. Switch to the /apps/zookeeper/bin directory and execute ZooKeeper startup
Action script.
1. cd /apps/zookeeper/bin
2. ./zkServer.sh start to check the running status of ZooKeeper.
1. ./zkServer.sh status
8. Change the directory to the /apps/kafka directory and start the kafka server.
1. cd /apps/kafka
2. bin/kafka-server-start.sh config/server.properties &
9. Open another window, switch to /apps/kafka, create a topic in kafka, and name it
test kafkaapi。
1. cd /apps/kafka
2. bin/kafka-topics.sh \
3. --create \
4. --zookeeper localhost:2181 \
5. --replication-factor 1 \
6. --topic testkafkaapi \
7. --partitions 1
View topic
1. bin/kafka-topics.sh --list --zookeeper localhost:2181
10. Create a kafka producer for producing data. Under the kafka3 project my.kafka package,
Create a Class and name it MyProducer. Write the code for the MyProducer class.
1. package my.kafka;
2. import java.util.Properties;
3. import kafka.javaapi.producer.Producer;
4. import kafka.producer.KeyedMessage;
5. import kafka.producer.ProducerConfig;
6. public class MyProducer {
7. private final Producer<String, String> producer;
8. public final static String TOPIC = "testkafkaapi";
9. public MyProducer() { 10. Properties props = new Properties();
11. // What is configured here is the port of kafka
12. props.put("metadata.broker.list", "localhost:9092");
13. // Configure the serialization class of value
14. props.put("serializer.class",
"kafka.serializer.StringEncoder");
15. // Configure the serialization class of key
16. props.put("key.serializer.class",
"kafka.serializer.StringEncoder");
17. // request.required.acks
18. // 0, which means that the producer never waits for an
acknowledgement
19. // from the broker (the same behavior as 0.7). This
option provides the
20. // lowest latency but the weakest durability guarantees
(some data will
21. // be lost when a server fails).
22. // 1, which means that the producer gets an
acknowledgement after the
23. // leader replica has received the data. This option
provides better
24. // durability as the client waits until the server
acknowledges the
25. // request as successful (only messages that were
written to the
26. // now-dead leader but not yet replicated will be
lost).
27. // -1, which means that the producer gets an
acknowledgement after all
28. // in-sync replicas have received the data. This option
provides the
29. // best durability, we guarantee that no messages will
be lost as long
30. // as at least one in sync replica remains.
31. props.put("request.required.acks", "-1");
32. producer = new Producer<String, String>(new
ProducerConfig(props));
33. } 34. void produce() {
35. int messageNo = 1000;
36. final int COUNT = 10000;
37. while (messageNo < COUNT) {
38. String key = String.valueOf(messageNo);
39. String data = "hello,kafka message:" + key;
40. producer.send(new KeyedMessage<String,
String>(TOPIC, key, data));
41. System.out.println(data);
42. messageNo++;
43. }
44. }
45.
46. public static void main(String[] args) {
47. new MyProducer().produce();
48. }
49. }
Code on the producer side: first define a topic name, and then create a properties
Instance, used to set the parameters of produce. Then create an instance of producer and configure the parameters
props are uploaded as parameters. Define a key and data in the produce method to create
KeyedMessage instance, upload key, data and topic as parameters, and then
KeyedMessage instance is uploaded to producer. Call MyProduce directly in the main function
The produce() method is used to upload messages.
11. Create kafka consumer. for consuming data. In the kafka3 project, under the my.kafka package,
Create a Class and name it MyConsumer. Write the code for MyConsumer class
1. package my.kafka;
2. import java.util.HashMap;
3. import java.util.List;
4. import java.util.Map;
5. import java.util.Properties;
6. import kafka.consumer.ConsumerConfig;
7. import kafka.consumer.ConsumerIterator; 8. import kafka.consumer.KafkaStream;
9. import kafka.javaapi.consumer.ConsumerConnector;
10.import kafka.serializer.StringDecoder;
11.import kafka.utils.VerifiableProperties;
12.public class MyConsumer {
13. private final ConsumerConnector consumer;
14.
15. private MyConsumer() {
16. Properties props = new Properties();
17. //zookeeper configuration
18. props.put("zookeeper.connect", "localhost:2181");
19. //group represents a consumer group
20. props.put("group.id", "mygroup");
21. //zk connection timeout
22. props.put("zookeeper.session.timeout.ms", "4000");
23. props.put("zookeeper.sync.time.ms", "200");
24. props.put("auto.commit.interval.ms", "1000");
25. props.put("auto.offset.reset", "smallest");
26. //Serialization class
27. props.put("serializer.class",
"kafka.serializer.StringEncoder");
28. ConsumerConfig config = new ConsumerConfig(props);
29. consumer =
kafka.consumer.Consumer.createJavaConsumerConnector(config);
30. }
31.
32. void consume() {
33. Map<String, Integer> topicCountMap = new
HashMap<String, Integer>();
34. topicCountMap.put(MyProducer.TOPIC, new Integer(1));
35.
36. StringDecoder keyDecoder = new StringDecoder(new
VerifiableProperties());
37. StringDecoder valueDecoder = new StringDecoder(new
VerifiableProperties());
38.
39. Map<String, List<KafkaStream<String, String>>>
consumerMap =consumer.createMessageStreams(topicCountMap,keyDecoder,valueDec
or);
40. KafkaStream<String, String> stream =
consumerMap.get(MyProducer.TOPIC).get(0);
41. ConsumerIterator<String, String> it =
stream.iterator();
42. while (it.hasNext())
43. System.out.println(it.next().message());
44. }
45.
46. public static void main(String[] args) {
47. new MyConsumer().consume();
48. }
49.}
It is divided into two parts on the MyConsumer side: MyConsumer() method and consume() method. exist
Create a properties instance in the MyConsumer() method to configure the consumer performance, and then
Create a consumer instance that receives messages and pass the properties instance as a parameter.
Call the createMessageStreams() method of the consumer class in the cousin() method to receive
The message is passed from kafka, and then the message is output to the console through iterative traversal.
12. Execute the program
Right-click on the MyProducer class in Eclipse and click ==>Run As==>Jave Application
item.
Enter the window below and click OK. 13. Then in the MyConsumer class: right click ==>Run As==>Jave Application option.
Then you can see the output results of the console interface.
Consumer
4. Analysis of experimental results
Changed the environment variables and configured Kafka, using a simple Java API to simulate Kafka's producer and
consumer, where producer generates content through a while loop and then passes the content to
Kafka, the consumer reads content from Kafka and outputs it in the Console interface.