Apache Kafka概念入门

相关概念术语

  • Broker:Kafka 集群包含一个或多个服务器,这种服务器被称为 broker。
  • Topic:每条发布到 Kafka 集群的消息都有一个类别,称为Topic。
  • Partition:物理上的概念,每个 Topic 包含一个或多个 Partition。
  • Producer:负责发布消息到 Kafka broker。
  • Consumer:向 Kafka broker 读取消息的客户端。
  • Consumer Group:每个 Consumer 属于一个特定的 Consumer Group(可为每个 Consumer 指定 group name,若不指定 group name 则属于默认的 group)。

说明(基于新版):

  • 在集群环境下,其中的一个broker会作为集群的controller,通过集群中处于活跃状态的brokers选举产生,类似管理者的角色,给brokers分配partions,监视brokers等;
  • 在集群环境下,一个partition属于一个单独的broker,对于这个partition而言,这个broker是leader。这个partition可能会被分配给其他的borkers作为备份(replicate)。这样当一个partition的leader(broker) faliure,其他的拥有这个partition的broker会接管它作为leader;
  • 消费者和生产者对一个partition的操作都是通过它的leader进行;
  • 可以配置brokers保存topic或messages的期限,如一段时间或者topic的容量达到一个值,满足这些条件后,messages就会被删除;也可以对具体的topic配置有效期。

Standalone模式

适合本地开发或者一些概念的验证;

安装JAVA

在安装zookeeper或kafka之前,先确定系统是否已经有java环境,详情略。信息如下:

C:\Users\Administrator>java -version
java version "1.8.0_151"
Java(TM) SE Runtime Environment (build 1.8.0_151-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.151-b12, mixed mode)

安装zookeeper

Zookeeper存储Broker和Topic的元数据信息或消费者客户端的细节(old version)。可以运行kafka包含的zookeeper脚本启动zookeeper server。这里从阿里云开源镜像站下载安装包zookeeper-3.4.6.tar.gz (所在网络访问不了apache),信息如下:

[root@localhost zookeeper1]# pwd
/usr/local/software/zookeeper/zookeeper1

[root@localhost zookeeper1]# ll
total 1612
drwxr-xr-x.  2  502 games    4096 Feb  2 01:06 bin
-rw-r--r--.  1  502 games   87943 Nov  1 11:47 build.xml
drwxr-xr-x.  2  502 games     104 Feb  6 04:20 conf
drwxr-xr-x. 10  502 games    4096 Nov  1 11:47 contrib
drwxr-xr-x.  3 root root       60 Feb  6 04:22 data
drwxr-xr-x.  2  502 games    4096 Nov  1 11:54 dist-maven
drwxr-xr-x.  6  502 games    4096 Nov  1 11:52 docs
-rw-r--r--.  1  502 games    1709 Nov  1 11:47 ivysettings.xml
-rw-r--r--.  1  502 games    8197 Nov  1 11:47 ivy.xml
drwxr-xr-x.  4  502 games    4096 Nov  1 11:52 lib
-rw-r--r--.  1  502 games   11938 Nov  1 11:47 LICENSE.txt
drwxr-xr-x.  3 root root       22 Feb  2 01:00 logs
-rw-r--r--.  1  502 games    3132 Nov  1 11:47 NOTICE.txt
-rw-r--r--.  1  502 games    1585 Nov  1 11:47 README.md
-rw-r--r--.  1  502 games    1770 Nov  1 11:47 README_packaging.txt
drwxr-xr-x.  5  502 games      44 Nov  1 11:47 recipes
drwxr-xr-x.  8  502 games    4096 Nov  1 11:52 src
drwxr-xr-x.  2 root root        6 Feb  2 01:00 version-2
-rw-r--r--.  1  502 games 1478279 Nov  1 11:49 zookeeper-3.4.11.jar
-rw-r--r--.  1  502 games     195 Nov  1 11:52 zookeeper-3.4.11.jar.asc
-rw-r--r--.  1  502 games      33 Nov  1 11:49 zookeeper-3.4.11.jar.md5
-rw-r--r--.  1  502 games      41 Nov  1 11:49 zookeeper-3.4.11.jar.sha1
-rw-r--r--.  1 root root        5 Feb  6 02:33 zookeeper_server.pid

[root@localhost conf]# ll /usr/local/software/zookeeper/zookeeper1/conf
total 20
-rw-r--r--. 1  502 games  535 Nov  1 11:47 configuration.xsl
-rw-r--r--. 1  502 games 2161 Nov  1 11:47 log4j.properties
-rw-r--r--. 1 root root   922 Feb  5 22:39 zoo22.cfg
-rw-r--r--. 1 root root  1323 Feb  6 04:20 zoo.cfg
-rw-r--r--. 1  502 games  922 Nov  1 11:47 zoo_sample.cfg

dataDir配置数据存储的目录, clientPort默认的2181;zoo.cfg如下:

# The number of milliseconds of each tick
ot@localhost conf]# vi zoo.cfg
    # The number of milliseconds of each tick  
    tickTime=2000  

    # The number of ticks that the initial  
    # synchronization phase can take  
    initLimit=10  

    # The number of ticks that can pass between  
    # sending a request and getting an acknowledgement  
    syncLimit=5  

    # the directory where the snapshot is stored.  
    # do not use /tmp for storage, /tmp here is just  
    # example sakes.  
    dataDir=/usr/local/software/zookeeper/zookeeper-3.4.11  

    #log info  
    dataLogDir=/usr/local/software/zookeeper/zookeeper-3.4.11/logs  

    # the port at which the clients will connect  
    clientPort=2181  

    # the maximum number of client connections.  
    # increase this if you need to handle more clients  
    #maxClientCnxns=60  
    #  
    # Be sure to read the maintenance section of the  
    # administrator guide before turning on autopurge.  
    #  
    # http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance  
    #  
    # The number of snapshots to retain in dataDir  
    #autopurge.snapRetainCount=3  
    # Purge task interval in hours  
    # Set to "0" to disable auto purge feature  
    #autopurge.purgeInterval=1  

Kafka介绍

Apache Kafka是一个分布式发布 - 订阅消息系统和一个强大的队列,可以处理大量的数据,并使您能够将消息从一个端点传递到另一个端点。 Kafka适合离线和在线消息消费。 Kafka消息保留在磁盘上,并在群集内复制以防止数据丢失。 Kafka构建在ZooKeeper同步服务之上。 它与Apache Storm和Spark非常好地集成,用于实时流式数据分析。

Kafka是一种分布式消息系统,由LinkedIn使用Scala编写,用作LinkedIn的活动流(Activity Stream)和运营数据处理管道(Pipeline)的基础,具有高水平扩展和高吞吐量。

目前越来越多的开源分布式处理系统如Apache flume、Apache Storm、Spark、Elasticsearch都支持与Kafka集成。下载解压kafka,版本是kafka_2.11-0.9.0.1

kafka 安装与配置

下载地址:http://kafka.apache.org/downloads

下载并解压

[root@log1 local]# wget https://archive.apache.org/dist/kafka/0.9.0.1/kafka_2.11-0.9.0.1.tgz
[root@log1 local]# tar zxvf kafka_2.11-0.9.0.0.tgz 

启动zookeeper

/usr/local/software/zookeeper/zookeeper1/bin/zkServer.sh start
/usr/local/software/zookeeper/zookeeper2/bin/zkServer.sh start  
/usr/local/software/zookeeper/zookeeper3/bin/zkServer.sh start  

修改配置文件server.properties

    17  ############################# Server Basics #############################
    18
    19  # The id of the broker. This must be set to a unique integer for each broker.
    20  broker.id=1
    21
    22  ############################# Socket Server Settings #############################
    23
    24  listeners=PLAINTEXT://:9092
    25
    40
    41  # The number of threads handling network requests
    42  num.network.threads=3
    43   
    44  # The number of threads doing disk I/O
    45  num.io.threads=8
    46
    47  # The send buffer (SO_SNDBUF) used by the socket server
    48  socket.send.buffer.bytes=102400
    49
    50  # The receive buffer (SO_RCVBUF) used by the socket server
    51  socket.receive.buffer.bytes=102400
    52
    53  # The maximum size of a request that the socket server will accept (protection against OOM)
    54  socket.request.max.bytes=104857600
    55
    56
    57  ############################# Log Basics #############################
    58
    59  # A comma seperated list of directories under which to store log files
    60  log.dirs=/tmp/kafka-logs
    61
    62  # The default number of log partitions per topic. More partitions allow greater
    63  # parallelism for consumption, but this will also result in more files across
    64  # the brokers.
    65  num.partitions=1
    66
    67  # The number of threads per data directory to be used for log recovery at startup and flushing at shutdown.
    68  # This value is recommended to be increased for installations with data dirs located in RAID array.
    69  num.recovery.threads.per.data.dir=1
    70
    71  ############################# Log Flush Policy #############################
    81
    82  # The number of messages to accept before forcing a flush of data to disk
    83  #log.flush.interval.messages=10000
    84
    85  # The maximum amount of time a message can sit in a log before we force a flush
    86  #log.flush.interval.ms=1000
    87
    88  ############################# Log Retention Policy #############################
    89
    94
    95  # The minimum age of a log file to be eligible for deletion
    96  log.retention.hours=168
    97
    98  # A size-based retention policy for logs. Segments are pruned from the log as long as the remaining
    99  # segments don't drop below log.retention.bytes.
   100  #log.retention.bytes=1073741824
   101
   102  # The maximum size of a log segment file. When this size is reached a new log segment will be created.
   103  log.segment.bytes=1073741824
   104
   105  # The interval at which log segments are checked to see if they can be deleted according 
   106  # to the retention policies
   107  log.retention.check.interval.ms=300000
   108
   109  # By default the log cleaner is disabled and the log retention policy will default to just delete segments after their retention expires.
   110  # If log.cleaner.enable=true is set the cleaner will be enabled and individual logs can then be marked for log compaction.
   111  log.cleaner.enable=false
   112
   113  ############################# Zookeeper #############################
   114
   115  # Zookeeper connection string (see zookeeper docs for details).
   116  # This is a comma separated host:port pairs, each corresponding to a zk
   117  # server. e.g. "127.0.0.1:3000,127.0.0.1:3001,127.0.0.1:3002".
   118  # You can also append an optional chroot string to the urls to specify the
   119  # root directory for all kafka znodes.
   120  zookeeper.connect=127.0.0.1:2181,127.0.0.1:2182,127.0.0.1:2183
   121
   122  # Timeout in ms for connecting to zookeeper
   123  zookeeper.connection.timeout.ms=6000

启动kafka server

bin/kafka-server-start.sh config/server.properties

(非本地生产者和消费者访问Kafka,记得修改 config/server.properties中的listeners, 例如
listeners=PLAINTEXT://192.168.33.152:9092)

create a topic

bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test

查看topic

bin/kafka-topics.sh --list --zookeeper localhost:2181

参考资源:
1. http://blog.csdn.net/z88897050/article/details/53893583#t1
2. https://www.cnblogs.com/lanyangsh/p/7782795.html
3. https://www.w3cschool.cn/apache_kafka/apache_kafka_quick_guide.html

猜你喜欢

转载自blog.csdn.net/thebigdipperbdx/article/details/79634965