Related Concept Terms
- Broker : A Kafka cluster contains one or more servers, which are called brokers.
- Topic : Every message published to a Kafka cluster has a category called Topic.
- Partition : A physical concept, each Topic contains one or more Partitions.
- Producer : Responsible for publishing messages to the Kafka broker.
- Consumer : The client that reads messages from the Kafka broker.
- Consumer Group : Each Consumer belongs to a specific Consumer Group (group name can be specified for each Consumer, if no group name is specified, it belongs to the default group).
Instructions (based on the new version):
- In the cluster environment, one of the brokers will act as the controller of the cluster and be elected by the active brokers in the cluster, similar to the role of the manager, assigning partitions to the brokers, monitoring the brokers, etc.;
- In a cluster environment, a partition belongs to a single broker, and for this partition, this broker is the leader. This partition may be assigned to other borrowers as a replicate. In this way, when the leader (broker) of a partition fails, other brokers owning this partition will take over it as the leader;
- Both consumers and producers operate on a partition through its leader;
- You can configure the period for brokers to save topics or messages, such as a period of time or when the capacity of the topic reaches a certain value. After these conditions are met, the messages will be deleted; you can also configure the validity period for a specific topic.
Standalone mode
Suitable for local development or some proof of concept;
Install JAVA
Before installing zookeeper or kafka, first determine whether the system already has a java environment, the details are omitted. Message as follows:
C:\Users\Administrator>java -version
java version "1.8.0_151"
Java(TM) SE Runtime Environment (build 1.8.0_151-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.151-b12, mixed mode)
Install zookeeper
Zookeeper stores Broker and Topic metadata information or consumer client details (old version). You can run the zookeeper script included in kafka to start the zookeeper server. Here, download the installation package zookeeper-3.4.6.tar.gz from the Alibaba Cloud open source mirror site (the network where apache is not accessible), the information is as follows:
[root@localhost zookeeper1]# pwd
/usr/local/software/zookeeper/zookeeper1
[root@localhost zookeeper1]# ll
total 1612
drwxr-xr-x. 2 502 games 4096 Feb 2 01:06 bin
-rw-r--r--. 1 502 games 87943 Nov 1 11:47 build.xml
drwxr-xr-x. 2 502 games 104 Feb 6 04:20 conf
drwxr-xr-x. 10 502 games 4096 Nov 1 11:47 contrib
drwxr-xr-x. 3 root root 60 Feb 6 04:22 data
drwxr-xr-x. 2 502 games 4096 Nov 1 11:54 dist-maven
drwxr-xr-x. 6 502 games 4096 Nov 1 11:52 docs
-rw-r--r--. 1 502 games 1709 Nov 1 11:47 ivysettings.xml
-rw-r--r--. 1 502 games 8197 Nov 1 11:47 ivy.xml
drwxr-xr-x. 4 502 games 4096 Nov 1 11:52 lib
-rw-r--r--. 1 502 games 11938 Nov 1 11:47 LICENSE.txt
drwxr-xr-x. 3 root root 22 Feb 2 01:00 logs
-rw-r--r--. 1 502 games 3132 Nov 1 11:47 NOTICE.txt
-rw-r--r--. 1 502 games 1585 Nov 1 11:47 README.md
-rw-r--r--. 1 502 games 1770 Nov 1 11:47 README_packaging.txt
drwxr-xr-x. 5 502 games 44 Nov 1 11:47 recipes
drwxr-xr-x. 8 502 games 4096 Nov 1 11:52 src
drwxr-xr-x. 2 root root 6 Feb 2 01:00 version-2
-rw-r--r--. 1 502 games 1478279 Nov 1 11:49 zookeeper-3.4.11.jar
-rw-r--r--. 1 502 games 195 Nov 1 11:52 zookeeper-3.4.11.jar.asc
-rw-r--r--. 1 502 games 33 Nov 1 11:49 zookeeper-3.4.11.jar.md5
-rw-r--r--. 1 502 games 41 Nov 1 11:49 zookeeper-3.4.11.jar.sha1
-rw-r--r--. 1 root root 5 Feb 6 02:33 zookeeper_server.pid
[root@localhost conf]# ll /usr/local/software/zookeeper/zookeeper1/conf
total 20
-rw-r--r--. 1 502 games 535 Nov 1 11:47 configuration.xsl
-rw-r--r--. 1 502 games 2161 Nov 1 11:47 log4j.properties
-rw-r--r--. 1 root root 922 Feb 5 22:39 zoo22.cfg
-rw-r--r--. 1 root root 1323 Feb 6 04:20 zoo.cfg
-rw-r--r--. 1 502 games 922 Nov 1 11:47 zoo_sample.cfg
dataDir configures the directory for data storage, clientPort defaults to 2181; zoo.cfg is as follows:
# The number of milliseconds of each tick
ot@localhost conf]# vi zoo.cfg
# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just
# example sakes.
dataDir=/usr/local/software/zookeeper/zookeeper-3.4.11
#log info
dataLogDir=/usr/local/software/zookeeper/zookeeper-3.4.11/logs
# the port at which the clients will connect
clientPort=2181
# the maximum number of client connections.
# increase this if you need to handle more clients
#maxClientCnxns=60
#
# Be sure to read the maintenance section of the
# administrator guide before turning on autopurge.
#
# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
#
# The number of snapshots to retain in dataDir
#autopurge.snapRetainCount=3
# Purge task interval in hours
# Set to "0" to disable auto purge feature
#autopurge.purgeInterval=1
Introduction to Kafka
Apache Kafka is a distributed publish-subscribe messaging system and a powerful queue that can handle large amounts of data and enable you to pass messages from one endpoint to another. Kafka is suitable for offline and online message consumption. Kafka messages remain on disk and replicated within the cluster to prevent data loss. Kafka is built on top of the ZooKeeper synchronization service. It integrates very well with Apache Storm and Spark for real-time streaming data analysis.
Kafka is a distributed messaging system written by LinkedIn in Scala and used as the basis for LinkedIn's Activity Stream and operational data processing pipelines, with high levels of scaling and high throughput.
Currently, more and more open source distributed processing systems such as Apache flume, Apache Storm, Spark, and Elasticsearch support integration with Kafka. Download and decompress kafka, the version is kafka_2.11-0.9.0.1
kafka installation and configuration
Download address: http://kafka.apache.org/downloads
Download and unzip
[root@log1 local]# wget https://archive.apache.org/dist/kafka/0.9.0.1/kafka_2.11-0.9.0.1.tgz
[root@log1 local]# tar zxvf kafka_2.11-0.9.0.0.tgz
start zookeeper
/usr/local/software/zookeeper/zookeeper1/bin/zkServer.sh start
/usr/local/software/zookeeper/zookeeper2/bin/zkServer.sh start
/usr/local/software/zookeeper/zookeeper3/bin/zkServer.sh start
Modify the configuration file server.properties
17 ############################# Server Basics #############################
18
19 # The id of the broker. This must be set to a unique integer for each broker.
20 broker.id=1
21
22 ############################# Socket Server Settings #############################
23
24 listeners=PLAINTEXT://:9092
25
40
41 # The number of threads handling network requests
42 num.network.threads=3
43
44 # The number of threads doing disk I/O
45 num.io.threads=8
46
47 # The send buffer (SO_SNDBUF) used by the socket server
48 socket.send.buffer.bytes=102400
49
50 # The receive buffer (SO_RCVBUF) used by the socket server
51 socket.receive.buffer.bytes=102400
52
53 # The maximum size of a request that the socket server will accept (protection against OOM)
54 socket.request.max.bytes=104857600
55
56
57 ############################# Log Basics #############################
58
59 # A comma seperated list of directories under which to store log files
60 log.dirs=/tmp/kafka-logs
61
62 # The default number of log partitions per topic. More partitions allow greater
63 # parallelism for consumption, but this will also result in more files across
64 # the brokers.
65 num.partitions=1
66
67 # The number of threads per data directory to be used for log recovery at startup and flushing at shutdown.
68 # This value is recommended to be increased for installations with data dirs located in RAID array.
69 num.recovery.threads.per.data.dir=1
70
71 ############################# Log Flush Policy #############################
81
82 # The number of messages to accept before forcing a flush of data to disk
83 #log.flush.interval.messages=10000
84
85 # The maximum amount of time a message can sit in a log before we force a flush
86 #log.flush.interval.ms=1000
87
88 ############################# Log Retention Policy #############################
89
94
95 # The minimum age of a log file to be eligible for deletion
96 log.retention.hours=168
97
98 # A size-based retention policy for logs. Segments are pruned from the log as long as the remaining
99 # segments don't drop below log.retention.bytes.
100 #log.retention.bytes=1073741824
101
102 # The maximum size of a log segment file. When this size is reached a new log segment will be created.
103 log.segment.bytes=1073741824
104
105 # The interval at which log segments are checked to see if they can be deleted according
106 # to the retention policies
107 log.retention.check.interval.ms=300000
108
109 # By default the log cleaner is disabled and the log retention policy will default to just delete segments after their retention expires.
110 # If log.cleaner.enable=true is set the cleaner will be enabled and individual logs can then be marked for log compaction.
111 log.cleaner.enable=false
112
113 ############################# Zookeeper #############################
114
115 # Zookeeper connection string (see zookeeper docs for details).
116 # This is a comma separated host:port pairs, each corresponding to a zk
117 # server. e.g. "127.0.0.1:3000,127.0.0.1:3001,127.0.0.1:3002".
118 # You can also append an optional chroot string to the urls to specify the
119 # root directory for all kafka znodes.
120 zookeeper.connect=127.0.0.1:2181,127.0.0.1:2182,127.0.0.1:2183
121
122 # Timeout in ms for connecting to zookeeper
123 zookeeper.connection.timeout.ms=6000
start kafka server
bin/kafka-server-start.sh config/server.properties
(For non-local producers and consumers to access Kafka, remember to modify the listeners in config/server.properties, such as
listeners=PLAINTEXT://192.168.33.152:9092)
create a topic
bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test
View topic
bin/kafka-topics.sh --list --zookeeper localhost:2181
Reference resources:
1. http://blog.csdn.net/z88897050/article/details/53893583#t1
2. https://www.cnblogs.com/lanyangsh/p/7782795.html
3. https://www .w3cschool.cn/apache_kafka/apache_kafka_quick_guide.html