Getting Started with Apache Kafka Concepts

Related Concept Terms

  • Broker : A Kafka cluster contains one or more servers, which are called brokers.
  • Topic : Every message published to a Kafka cluster has a category called Topic.
  • Partition : A physical concept, each Topic contains one or more Partitions.
  • Producer : Responsible for publishing messages to the Kafka broker.
  • Consumer : The client that reads messages from the Kafka broker.
  • Consumer Group : Each Consumer belongs to a specific Consumer Group (group name can be specified for each Consumer, if no group name is specified, it belongs to the default group).

Instructions (based on the new version):

  • In the cluster environment, one of the brokers will act as the controller of the cluster and be elected by the active brokers in the cluster, similar to the role of the manager, assigning partitions to the brokers, monitoring the brokers, etc.;
  • In a cluster environment, a partition belongs to a single broker, and for this partition, this broker is the leader. This partition may be assigned to other borrowers as a replicate. In this way, when the leader (broker) of a partition fails, other brokers owning this partition will take over it as the leader;
  • Both consumers and producers operate on a partition through its leader;
  • You can configure the period for brokers to save topics or messages, such as a period of time or when the capacity of the topic reaches a certain value. After these conditions are met, the messages will be deleted; you can also configure the validity period for a specific topic.

Standalone mode

Suitable for local development or some proof of concept;

Install JAVA

Before installing zookeeper or kafka, first determine whether the system already has a java environment, the details are omitted. Message as follows:

C:\Users\Administrator>java -version
java version "1.8.0_151"
Java(TM) SE Runtime Environment (build 1.8.0_151-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.151-b12, mixed mode)

Install zookeeper

Zookeeper stores Broker and Topic metadata information or consumer client details (old version). You can run the zookeeper script included in kafka to start the zookeeper server. Here, download the installation package zookeeper-3.4.6.tar.gz from the Alibaba Cloud open source mirror site (the network where apache is not accessible), the information is as follows:

[root@localhost zookeeper1]# pwd
/usr/local/software/zookeeper/zookeeper1

[root@localhost zookeeper1]# ll
total 1612
drwxr-xr-x.  2  502 games    4096 Feb  2 01:06 bin
-rw-r--r--.  1  502 games   87943 Nov  1 11:47 build.xml
drwxr-xr-x.  2  502 games     104 Feb  6 04:20 conf
drwxr-xr-x. 10  502 games    4096 Nov  1 11:47 contrib
drwxr-xr-x.  3 root root       60 Feb  6 04:22 data
drwxr-xr-x.  2  502 games    4096 Nov  1 11:54 dist-maven
drwxr-xr-x.  6  502 games    4096 Nov  1 11:52 docs
-rw-r--r--.  1  502 games    1709 Nov  1 11:47 ivysettings.xml
-rw-r--r--.  1  502 games    8197 Nov  1 11:47 ivy.xml
drwxr-xr-x.  4  502 games    4096 Nov  1 11:52 lib
-rw-r--r--.  1  502 games   11938 Nov  1 11:47 LICENSE.txt
drwxr-xr-x.  3 root root       22 Feb  2 01:00 logs
-rw-r--r--.  1  502 games    3132 Nov  1 11:47 NOTICE.txt
-rw-r--r--.  1  502 games    1585 Nov  1 11:47 README.md
-rw-r--r--.  1  502 games    1770 Nov  1 11:47 README_packaging.txt
drwxr-xr-x.  5  502 games      44 Nov  1 11:47 recipes
drwxr-xr-x.  8  502 games    4096 Nov  1 11:52 src
drwxr-xr-x.  2 root root        6 Feb  2 01:00 version-2
-rw-r--r--.  1  502 games 1478279 Nov  1 11:49 zookeeper-3.4.11.jar
-rw-r--r--.  1  502 games     195 Nov  1 11:52 zookeeper-3.4.11.jar.asc
-rw-r--r--.  1  502 games      33 Nov  1 11:49 zookeeper-3.4.11.jar.md5
-rw-r--r--.  1  502 games      41 Nov  1 11:49 zookeeper-3.4.11.jar.sha1
-rw-r--r--.  1 root root        5 Feb  6 02:33 zookeeper_server.pid

[root@localhost conf]# ll /usr/local/software/zookeeper/zookeeper1/conf
total 20
-rw-r--r--. 1  502 games  535 Nov  1 11:47 configuration.xsl
-rw-r--r--. 1  502 games 2161 Nov  1 11:47 log4j.properties
-rw-r--r--. 1 root root   922 Feb  5 22:39 zoo22.cfg
-rw-r--r--. 1 root root  1323 Feb  6 04:20 zoo.cfg
-rw-r--r--. 1  502 games  922 Nov  1 11:47 zoo_sample.cfg

dataDir configures the directory for data storage, clientPort defaults to 2181; zoo.cfg is as follows:

# The number of milliseconds of each tick
ot@localhost conf]# vi zoo.cfg
    # The number of milliseconds of each tick  
    tickTime=2000  

    # The number of ticks that the initial  
    # synchronization phase can take  
    initLimit=10  

    # The number of ticks that can pass between  
    # sending a request and getting an acknowledgement  
    syncLimit=5  

    # the directory where the snapshot is stored.  
    # do not use /tmp for storage, /tmp here is just  
    # example sakes.  
    dataDir=/usr/local/software/zookeeper/zookeeper-3.4.11  

    #log info  
    dataLogDir=/usr/local/software/zookeeper/zookeeper-3.4.11/logs  

    # the port at which the clients will connect  
    clientPort=2181  

    # the maximum number of client connections.  
    # increase this if you need to handle more clients  
    #maxClientCnxns=60  
    #  
    # Be sure to read the maintenance section of the  
    # administrator guide before turning on autopurge.  
    #  
    # http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance  
    #  
    # The number of snapshots to retain in dataDir  
    #autopurge.snapRetainCount=3  
    # Purge task interval in hours  
    # Set to "0" to disable auto purge feature  
    #autopurge.purgeInterval=1  

Introduction to Kafka

Apache Kafka is a distributed publish-subscribe messaging system and a powerful queue that can handle large amounts of data and enable you to pass messages from one endpoint to another. Kafka is suitable for offline and online message consumption. Kafka messages remain on disk and replicated within the cluster to prevent data loss. Kafka is built on top of the ZooKeeper synchronization service. It integrates very well with Apache Storm and Spark for real-time streaming data analysis.

Kafka is a distributed messaging system written by LinkedIn in Scala and used as the basis for LinkedIn's Activity Stream and operational data processing pipelines, with high levels of scaling and high throughput.

Currently, more and more open source distributed processing systems such as Apache flume, Apache Storm, Spark, and Elasticsearch support integration with Kafka. Download and decompress kafka, the version is kafka_2.11-0.9.0.1

kafka installation and configuration

Download address: http://kafka.apache.org/downloads

Download and unzip

[root@log1 local]# wget https://archive.apache.org/dist/kafka/0.9.0.1/kafka_2.11-0.9.0.1.tgz
[root@log1 local]# tar zxvf kafka_2.11-0.9.0.0.tgz 

start zookeeper

/usr/local/software/zookeeper/zookeeper1/bin/zkServer.sh start
/usr/local/software/zookeeper/zookeeper2/bin/zkServer.sh start  
/usr/local/software/zookeeper/zookeeper3/bin/zkServer.sh start  

Modify the configuration file server.properties

    17  ############################# Server Basics #############################
    18
    19  # The id of the broker. This must be set to a unique integer for each broker.
    20  broker.id=1
    21
    22  ############################# Socket Server Settings #############################
    23
    24  listeners=PLAINTEXT://:9092
    25
    40
    41  # The number of threads handling network requests
    42  num.network.threads=3
    43   
    44  # The number of threads doing disk I/O
    45  num.io.threads=8
    46
    47  # The send buffer (SO_SNDBUF) used by the socket server
    48  socket.send.buffer.bytes=102400
    49
    50  # The receive buffer (SO_RCVBUF) used by the socket server
    51  socket.receive.buffer.bytes=102400
    52
    53  # The maximum size of a request that the socket server will accept (protection against OOM)
    54  socket.request.max.bytes=104857600
    55
    56
    57  ############################# Log Basics #############################
    58
    59  # A comma seperated list of directories under which to store log files
    60  log.dirs=/tmp/kafka-logs
    61
    62  # The default number of log partitions per topic. More partitions allow greater
    63  # parallelism for consumption, but this will also result in more files across
    64  # the brokers.
    65  num.partitions=1
    66
    67  # The number of threads per data directory to be used for log recovery at startup and flushing at shutdown.
    68  # This value is recommended to be increased for installations with data dirs located in RAID array.
    69  num.recovery.threads.per.data.dir=1
    70
    71  ############################# Log Flush Policy #############################
    81
    82  # The number of messages to accept before forcing a flush of data to disk
    83  #log.flush.interval.messages=10000
    84
    85  # The maximum amount of time a message can sit in a log before we force a flush
    86  #log.flush.interval.ms=1000
    87
    88  ############################# Log Retention Policy #############################
    89
    94
    95  # The minimum age of a log file to be eligible for deletion
    96  log.retention.hours=168
    97
    98  # A size-based retention policy for logs. Segments are pruned from the log as long as the remaining
    99  # segments don't drop below log.retention.bytes.
   100  #log.retention.bytes=1073741824
   101
   102  # The maximum size of a log segment file. When this size is reached a new log segment will be created.
   103  log.segment.bytes=1073741824
   104
   105  # The interval at which log segments are checked to see if they can be deleted according 
   106  # to the retention policies
   107  log.retention.check.interval.ms=300000
   108
   109  # By default the log cleaner is disabled and the log retention policy will default to just delete segments after their retention expires.
   110  # If log.cleaner.enable=true is set the cleaner will be enabled and individual logs can then be marked for log compaction.
   111  log.cleaner.enable=false
   112
   113  ############################# Zookeeper #############################
   114
   115  # Zookeeper connection string (see zookeeper docs for details).
   116  # This is a comma separated host:port pairs, each corresponding to a zk
   117  # server. e.g. "127.0.0.1:3000,127.0.0.1:3001,127.0.0.1:3002".
   118  # You can also append an optional chroot string to the urls to specify the
   119  # root directory for all kafka znodes.
   120  zookeeper.connect=127.0.0.1:2181,127.0.0.1:2182,127.0.0.1:2183
   121
   122  # Timeout in ms for connecting to zookeeper
   123  zookeeper.connection.timeout.ms=6000

start kafka server

bin/kafka-server-start.sh config/server.properties

(For non-local producers and consumers to access Kafka, remember to modify the listeners in config/server.properties, such as
listeners=PLAINTEXT://192.168.33.152:9092)

create a topic

bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test

View topic

bin/kafka-topics.sh --list --zookeeper localhost:2181

Reference resources:
1. http://blog.csdn.net/z88897050/article/details/53893583#t1
2. https://www.cnblogs.com/lanyangsh/p/7782795.html
3. https://www .w3cschool.cn/apache_kafka/apache_kafka_quick_guide.html

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325469048&siteId=291194637