The characteristics of Kafka application scenario construction process

1. Why use message queues?

Decoupling: The process of decoupling data.

Redundancy: In some cases, the process of processing data will fail. Unless the data is persisted, it will be lost. The message queue persists the data until they have been completely processed, avoiding the risk of data loss in this way. In the "insert-get-delete" paradigm adopted by many message queues, before deleting a message from the queue, your processing system needs to clearly indicate that the message has been processed to ensure that your data is stored safely Until you finish using it.

Scalability : Because the message queue decouples the processing process. Therefore, it is easy to increase the frequency of message enqueuing and processing. As long as additional processing procedures are added. No need to change the code, no need to adjust the parameters.

Peak processing capacity:

In the case of a surge in traffic, applications still need to continue to play a role, but such burst traffic is not common; it is undoubtedly a huge waste to invest resources on standby to handle such peak visits as the standard. The use of message queues enables key components to withstand sudden access pressure, and will not completely collapse due to sudden overloaded requests.

Recoverability:

When some components of the system fail, the entire system will not be affected. The message queue reduces the coupling between processes, so even if a message processing process hangs up, the messages added to the queue can still be processed after the system is restored.

Order guarantee:

In most usage scenarios, the order of data processing is important. Most message queues are sorted by nature, and it is guaranteed that the data will be processed in a specific order. Kafka can guarantee the order of messages within a Partition.

Asynchronous communication:

In many cases, the user does not want and does not need to process the message immediately. The message queue provides an asynchronous processing mechanism that allows users to put a message into the queue, but does not process it immediately. Put as many messages as you want in the queue, and then process them when needed.

Buffer: (Don't understand) In any important system, there will be elements that require different processing time. For example, loading an image takes less time than applying filters. The message queue uses a buffer layer to help the most efficient execution of the task-the processing of the write queue will be as fast as possible. This buffer helps control and optimize the speed of data flow through the system.

Used for data flow: (don’t understand)

In a distributed system, it is a huge challenge to get an overall impression of how long user operations will take and why. The message series uses the frequency at which the message is processed to facilitate the determination of the processing processes or areas that perform poorly, and the data flow in these places is not optimized enough.

2. Why use kafka

Support multiple producers

Regardless of whether the client uses a single theme or multiple themes. So it is very suitable for collecting data from multiple front-end systems and providing data in a unified format.

 

Multiple consumers

In addition to supporting multiple producers, Kafka supports multiple consumers to read data from a single message stream. And consumers do not affect each other. This is different from other queuing systems. Once the messages of other queuing systems are read by a client, other clients can no longer read it. In addition, multiple consumers can form a group, they share a message flow, and guarantee the entire The group processes each given message only once.

Disk-based data storage

Kafka not only supports multiple consumers, but also allows consumers to read messages in non-real time, thanks to Kafka data

Retain characteristics. The message is submitted to the disk and saved according to the set retention rules. Each theme can be set individually

Retention rules, in order to meet the needs of different consumers, each topic can retain a different number of messages. Consumer may

Due to slow processing speed or sudden traffic peaks, there is no land to read messages in time, and persistent data can guarantee data

Will not be lost. Consumers can be offline for a short period of time during application maintenance without worrying about message loss or blocking

It's on the producer's side. The consumer can be closed, but the message will remain in Kafka. Consumers can start from the last time

Continue to process the message where it left off.

Scalability

In order to be able to easily handle large amounts of data, Kafka was designed from the beginning to be a flexible and scalable system. use

In the development stage, users can use a single broker first, and then expand to a small development cluster containing 3 brokers, and then as the data salt continues to grow, the cluster deployed to the production environment may contain hundreds of brokers. The expansion of the online cluster does not affect the availability of the overall system. In other words, a cluster containing multiple brokers can continue to provide services to customers even if individual brokers fail. To improve the fault tolerance of the cluster, a higher replication factor needs to be configured

high performance

All the features mentioned above make Kafka a high-performance publish and subscribe messaging system. Through horizontal expansion

Producers, consumers and brokers, Kafka can easily handle huge message streams. While processing large amounts of data,

It can also guarantee sub-second message delay.

3. Introduction to related concepts

3.1 Partition

K aflca to achieve data redundancy and scalability by partitioning.

Sub- region can be distributed on different servers, that is, Yi theme can span multiple servers as a way to provide more than
one- servers more powerful performance.

We usually use the term to describe the flow K aflca data of such systems

3.2 Producer

Under normal circumstances, a message will be published on a specific topic. The producer distributes the message evenly to
all partitions of the topic by default , and does not care which partition a particular message will be written to. However, in some cases, the production
who will direct the message written to the specified partition. This is usually achieved through a message key and a partitioner. The partitioner generates a hash value for the key and maps it to the specified partition. This ensures that messages containing the same key will be written to the same partition. Producers may be used partitioner custom business rules depending on the mapping message to a separatory zone

3.3 Consumer

Consumers book reading Yi or more topics, and read them in the order of messages generated. Offset by the consumer to distinguish the disk check messages
divided message had been read. Offset is another a meta-data, which is a th constantly increasing integer value, when creating a message, Kafka will add it to the message inside. In a given partition, the offset information of each quiet is the only one of. Consumer interest to quiet the offset of each partition saved in the final reading of Zoo Keeper or Kafka on, if quiet expenses were shut down or re- started, it reads the state is not lost.

 

3.4 Consumer group relations

Consumers are consumer groups a part, that is to say, there will be one or more consumers to read a theme . Fat group to ensure that each partition can only be a use of a consumer . FIG. 1-6 group shown, there . 3 th consumer simultaneously read a topic. Wherein each of the two consumers read a partition, additionally a th consumer reads two other partitions. The mapping between consumers and partitions is usually referred to as the ownership of the partitions by the consumers .

 

3.5broker

A separate kafka server is referred Broker. Broker then receiving from message producers, the message set offset
amount, and submit the message to disk storage. broker providing services to consumers, responding to a request to read the partition, back to
back has been submitted to the message on the disk. According to a particular hardware and its performance characteristics, a single broker can easily handle several
one thousand megabits per second partitions and the amount of message class.

broker are part of a cluster of points. Each cluster has Yi Ge broker at the same time as the role of the cluster controller (automatically
elected from the active members of the cluster). The controller responsible for the management, including the partition allocated to the broker and monitor
broker. In a cluster, Yi partition belonging Yi Ge broker, i Hai broker is known as the leader of the partition . A sub- region
can be assigned to a plurality of Broke R & lt , this time partition replication (see occur 1-7 ). This replication mechanism provides
message redundancy for partitions . If one broker fails, other brokers can take over the leadership. However, the relevant consumers
and producers have to reconnect to the new leader. Chapter 6 will introduce the operation of the cluster in detail, including partition replication.

Reservation message (in a given period) is Kafka is an important characteristic. Kafka Broker default message retention policy
is such that: either leaves a period of time (eg . 7 days), or keep the number of bytes in the message reaches a certain size (ratio
such as 1GB ). When the number of these messages reached the upper limit, old messages will expire well be deleted except, so at any moment, can
use the message of the total amount will not exceed the size specified configuration parameters. Themes can configure their own retention policies and keep the
secret messages until they are no longer used. For example, the data used to track user activities may need to keep a few days, but should
probably just need to keep a metrics program a few hours. The theme can be configured as a compact log, only
the last Yi message with a specific key will be retained. For this situation to change the type of log data, it is rather appropriate
, because people only care about the last-minute change occurs.

Broker's message replication

4. Environment setup

Overall environment

Virtual machine centos 64

jdk usr/java

download link

https://pan.baidu.com/s/18IicPYf7W0j-sHBXvfKyyg?errno=0&errmsg=Auth%20Login%20Sucess&&bduss=&ssnerror=0&traceid=

 

zookeeper

usr/local/zookeepercluster/zookeeper1

usr/local/zookeepercluster/zookeeper2

usr/local/zookeepercluster/zookeeper3

download link:

https://archive.apache.org/dist/zookeeper/zookeeper-3.4.6/

4.1zookeeper cluster construction

Build a directory like this /usr/local/zookeepercluster/zookeeper1

cp -r zookeeper1/ zookeeper2

cp -r zookeeper1/ zookeeper3

 

4.1.1 zookeeper data/myid

[root@localhost zookeepercluster]# cd zookeeper1

[root@localhost zookeeper1]# mkdir data

[root@localhost zookeeper1]# cd data

[root@localhost data]# echo 1>>myid

2 and 3 also follow this configuration. The id is 2 and 3 respectively

4.1.2 Configure zoo.cfg

  1. Copy the zoo_sample.cfg file in the conf directory under zookeeper1 and rename it to zoo.cfg

  1. Modify the configuration of zoo.cfg

zookeeper1

zookeeper2

(2881 is the voting port, 3881 is the election port)

 

4.1.3 Create a quick start zookeepercluster/start.sh

./zookeeper1/bin/zkServer.sh start

./zookeeper2/bin/zkServer.sh start

./zookeeper3/bin/zkServer.sh start

 

4.1.4 Create a shortcut to close shutdown.sh

./zookeeper1/bin/zkServer.sh stop

./zookeeper2/bin/zkServer.sh stop

./zookeeper3/bin/zkServer.sh stop

4.1.5启动集群

[root@localhost zookeepercluster]# ./start.sh

4.1.6查看集群状态

[root@localhost zookeepercluster]# zookeeper1/bin/zkServer.sh status

显示结果

JMX enabled by default

Using config: /usr/local/zookeepercluster/zookeeper1/bin/../conf/zoo.cfg

Mode: follower

 

4.1.7zookeeper的日志配置

 附:Zookeeper默认会将控制台信息输出到启动路径下的zookeeper.out中,显然在生产环境中我们不能允许Zookeeper这样做,通过如下方法,可以让Zookeeper输出按尺寸切分的日志文件:

    修改conf/log4j.properties文件,将

    zookeeper.root.logger=INFO, CONSOLE

    改为

    zookeeper.root.logger=INFO, ROLLINGFILE

    修改bin/zkEnv.sh文件,将

    ZOO_LOG4J_PROP="INFO,CONSOLE"

    改为

    ZOO_LOG4J_PROP="INFO,ROLLINGFILE"

    然后重启zookeeper,就ok了

 附:Zookeeper默认会将控制台信息输出到启动路径下的zookeeper.out中,显然在生产环境中我们不能允许Zookeeper这样做,通过如下方法,可以让Zookeeper输出按尺寸切分的日志文件:

    修改conf/log4j.properties文件,将

    zookeeper.root.logger=INFO, CONSOLE

    改为

    zookeeper.root.logger=INFO, ROLLINGFILE

    修改bin/zkEnv.sh文件,将

    ZOO_LOG4J_PROP="INFO,CONSOLE"

    改为

    ZOO_LOG4J_PROP="INFO,ROLLINGFILE"

    然后重启zookeeper,就ok了

 

4.2kafka安装

下载地址

http://archive.apache.org/dist/kafka/0.9.0.1/

如下目录

usr/local/kafkacluster/kafka1

usr/local/kafkacluster/kafka2

4.2.1编辑配置文件

vi  kafka1/config/server.properties

主要修改以下内容(ip改成localhost也行)

broker.id=0

listeners=PLAINTEXT://:9092

port=9092

host.name=192.168.0.107

advertised.host.name=192.168.0.107

advertised.port=9092

zookeeper.connect=localhost:2181

4.2.2启动节点1

[root@localhost kafkacluster]# kafka1/bin/kafka-server-start.sh -daemon kafka1/config/server.properties

4.2.3测试创建topic

[root@localhost kafkacluster]# kafka1/bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test

 

[root@localhost bin]# ./kafka-topics.sh --create --zookeeper localhost:2181,localhost:2182,localhost:2183 --replication-factor 1 --partitions 1 --topic lytest

4.2.4测试topic

[root@localhost kafkacluster]# kafka1/bin/kafka-topics.sh --describe --zookeeper localhost:2181,localhost:2182,localhost:2183

 

4.2.5测试生产者

kafka1/bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test   

[root@localhost kafkacluster]# kafka1/bin/kafka-console-producer.sh --broker-list localhost:9092 --topic testhome

Test Message 1

Test Message 2

^C[root@localhost kafkacluster]#

4.2.6测试消费者

kafka1/bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic test --from-beginning 

[root@localhost kafkacluster]# kafka1/bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic testhome --from-beginning

Test Message 1

Test Message 2

^CProcessed a total of 2 messages

4.2.7kafka关闭

kafka1/bin/kafka-server-stop.sh

4.2.8完整的创建topic,测试生产,测试消费

[root@localhost kafkacluster]# kafka1/bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic ly

Created topic "ly".

[root@localhost kafkacluster]# kafka1/bin/kafka-console-producer.sh --broker-list localhost:9092 --topic ly

this a message

^C[root@localhost kafkacluster]# kafka1/bin/kafka-console-consumer.sh --zookeeper localst:2181 --topic ly --from-beginning

this a message

5.kafka集群

  1. 5.1broker的配置

要把一个 broker 加入到集群里,只需要修改两个配置参数。首先,所有 broker 都必须配
置相同的 zookeeper .connect , 该参数指定了用于保存元数据的 Zookeeper 群组和路 径。
其次,每个 broker 都必须为 broke r . i.d 参数设置唯一的值。 如果两个 broker 使用相同的b roke r .id,那么第二个 broker 就无法启动。在运行集群时,还可以配置其他一些参数,特别是那些用于控制数据复制的参数

具体参见如下,kafka2

broker.id=1

listeners=PLAINTEXT://:9093

port=9093

host.name=localhost

advertised.host.name=localhost

advertised.port=9093

log.dirs=/tmp/kafka-logs-1   //这里不要与kafka1设置一样的

zookeeper.connect=localhost:2181

启动2

kafka2/bin/kafka-server-start.sh -daemon kafka2/config/server.properties

查看是不是启动了

jps –lm查看java进程

 

[root@localhost kafkacluster]# jps -lm

3505 org.apache.zookeeper.server.quorum.QuorumPeerMain /usr/local/zookeepercluster/zookeeper3/bin/../conf/zoo.cfg

6311 sun.tools.jps.Jps -lm

3482 org.apache.zookeeper.server.quorum.QuorumPeerMain /usr/local/zookeepercluster/zookeeper2/bin/../conf/zoo.cfg

3835 kafka.Kafka kafka1/config/server.properties

6172 kafka.Kafka kafka2/config/server.properties

3469 org.apache.zookeeper.server.quorum.QuorumPeerMain /usr/local/zookeepercluster/zookeeper1/bin/../conf/zoo.cfg

6. 分配分区是怎样的一个过程得知启动成功

当消费者要计入群组时,它会向群组协调器发送一个JoinGroup请求。第一个加入群组的消费者将成为“群主”。群主从协调器那里获得群组的成员列表(列表中包含了所有最近发送过心跳的消费者,它们被认为是活跃的),并负责给每一个悄费者分配分区。它使用一个实现了partitionAssignor接扣的类来决定哪些分区应该被分配给哪个消费者 。
 

Kafka内置了两种分配策略。 分配完毕之后,群主把分配情况列表发送给群组协调器,协调器再把这些信息发
送给所有消费者。每个消费者只能看到自己的分配信息,只有群主知道群组里所有消费者的分配信息。这个过程会在每次再均衡时重复发生。

7.偏移量的作用

消费者往一个叫作 _consumer_offset 的特殊主题发送消息,消息里包含每个分区的偏移量。 如果消费者一直处于运行状态,那么偏移量就没有什么用处。不过,如果消费者发生崩愤或者有新的消费者加入群组,就会触发再均衡,完成再均衡之后,每个消费者可能分配到新的分区,而不是之前处理的那个。为了能够继续之前的工作,消费者需要读取每个分区最后一次提交的偏移量,然后从偏移量指定的地方继续处理。如果提交的偏移量小于客户端处理的最后一个消息的偏移量 ,那么处于两个偏移量之间的消息就会被重复处理;如果提交的偏移量大于客户端处理的最后一个消息的偏移量,那么处于两个偏移量之间的消息将会丢失;
 

Guess you like

Origin blog.csdn.net/liyang_nash/article/details/103376552