In-depth understanding of Kafka series (1)-first knowledge of Kafka

Series Article Directory

Kakfa authoritative guide series articles

Preface

This series is my transcript and thoughts after reading the book "The Definitive Guide to Kafka".

text

Introduction to Kafka

Kafka is a message system based on publish and subscribe, generally called "distributed streaming platform". Kafka's data will be persistently stored in a certain order, and users can read them on demand. And Kafka supports a distributed architecture, with data failure protection and performance shrinking capabilities.

Several related concepts of Kafka

  1. Message (message): the data unit of Kafka
  2. Batch: Represents a group of messages, and messages in the same group belong to a topic and partition. Because if you want to send a message to the network, it is very time-consuming, so you can send batches of data, that is, batches.
  3. Topic: The messages in Kafka are classified by topic. The theme can be seen as a table in the database.
  4. Partition (partition): A topic can be divided into several partitions, and a partition is a commit log.
  5. Producer: First, the client of Kafka is the user of the Kafka system, and users are divided into producers and consumers. The producer here refers to the user who created the message.
  6. Consumer (consumer): responsible for reading messages, can also be called subscribers. A consumer can subscribe to one or multiple topics and read them in the order in which the messages are generated.
  7. Offset (offset): a kind of metadata, which is an increasing integer value. When creating a message, Kafka will add the offset to the message.
  8. Consumer group: A consumer group can contain multiple consumers.
  9. broker: An independent Kafka server is called a broker. The broker is responsible for receiving messages from producers. And set the offset for the message, and submit the message to the disk for storage, at the same time provide services for consumers, and respond to requests to read the partition.

Kafka's message addition and consumption model.

Insert picture description here
Overview of Kafka
Insert picture description here

note:

  1. Messages are written to the partition in an append , and then read in a first-in, first-out order .
  2. Because a topic generally contains multiple partitions, there is no guarantee that the order within a topic is orderly . But it can be guaranteed that the messages in a single partition are in order .
  3. The producer publishes a message to a specific topic, and generally distributes the message evenly to all partitions under the specified topic by default.
  4. Consumers distinguish the messages that have been read based on the offset of the checked message .
  5. Consumers in a consumer group are mutually exclusive , ensuring that each partition can only be used by one consumer .

Why choose Kafka

There are many messaging systems based on publish and subscribe, but why do many large companies use Kafka?
the reason:

  1. Kafka can support multiple producers and consumers: whether using a single or multiple topics on the client side, or multiple consumers reading data from a single message stream. And consumers do not affect each other. (And once the messages of other queue systems are read by a client, other clients cannot read it)
  2. Disk-based data storage: Kafka's data has retention characteristics. The message will be submitted to disk and saved in a certain order. And each topic can set the message retention rules separately.
  3. Scalability: Kafka supports clustering and distributed. The number of nodes can be expanded horizontally.
  4. High performance: Kafka's concurrent processing capacity is undoubtedly the highest (compared to ActiveMQ, RabbitMQ, RocketMQ)

Kafka

The following is the connection, including the installation package of zookeeper and kafka.
Link: click on me
Extract code: cvhs

Install zookeeper

  1. Unzip the installation package: tar -zxf zookeeper-3.4.6.tar.gz
  2. Enter the installation package and copy a configuration file (zoo.cfg is used by default): cp zoo_sample.cfg zoo.cfg
  3. Modify the configuration file: vi conf/zoo.cfg
# 先提前在安装目录下创建一个文件夹zkData:mkdir zkData,用来保存数据和日志
dataDir=/opt/modules/zookeeper-3.4.6/zkData
  1. Start zookeeper: bin/zkServer.sh start , if the following words appear, it means the startup is successful.
    Insert picture description here

Afka kafka

  1. Unzip: tar -zxf kafka_2.11-0.11.0.0.tgz
  2. Enter the folder and create a directory: mkdir logs
  3. Modify the configuration file: vi config/server.properties
    modify the log storage path and the address of zookeeper.
zookeeper.connect=192.168.135.237:2181
log.dirs=/opt/modules/kafka_2.11-0.11.0.0/logs
#添加参数
default.replication.factor=1
#打开监听,否则消息写入不进去
listeners=PLAINTEXT://192.168.135.237:9092
  1. Start kafka:
./bin/kafka-server-start.sh config/server.properties
  1. Create topic:
./bin/kafka-topics.sh --zookeeper 192.168.135.237:2181 --partitions 1 --replication-factor 1 --create --topic test2

If the image appears, it means the creation is successful.
Insert picture description here

  1. Producer produces news:
./bin/kafka-console-producer.sh --broker-list 192.168.135.237:9092 --topic test2

Insert picture description here

  1. Consumer consumption news:
./bin/kafka-console-consumer.sh --zookeeper 192.168.135.237:2181 --from-beginning --topic test2

Insert picture description here


Detailed explanation of several general configurations of Kafka configuration files

Broker related configuration:

  1. broker.id:

As the unique identifier of the broker, it cannot be repeated. The default value is 0, so if you want to build a cluster, this id must be changed

  1. zookeeper.connect

Zookeeper address used to store broker metadata. If there is a zookeeper cluster, separate them with commas. The
format is: hostname:port/path.
hostname: Zookeeper's server IP
port: Zookeeper's client connection port, generally 2181
path: optional configuration, if not specified, the root path is used by default.

  1. log.dirs

All messages of kafka are stored on the disk, and the directory where these log fragments are stored is specified by this parameter.

  1. num.recovery.thread.per.data.dir

There are generally three situations. Kafka will use a configurable thread pool to process log fragments
. 1. The server starts normally to open the log fragments of each partition.
2. Restart after the server crashes to check and truncate the log fragments of each partition.
3. The server is normally shut down to close the log fragments.
By default, each log directory uses only one county.

  1. auto.create.topics.enable

There are generally three situations, Kafka will automatically create a topic
1. When a producer starts to write messages to the topic.
2. When a consumer starts to read messages from the topic.
3. When any client sends metadata to the topic.

Some default configurations of the theme

  1. number partitions

As an integer, it indicates how many partitions the newly created topic will contain.

  1. log.retention.ms

Kafka usually decides how long data can be retained based on time.
The log.retention.hours parameter is used by default to configure the time, and the default time is 168 hours, which is 1 week

  1. log.retention.bytes

Determine whether the message expires by the number of reserved message bytes. Act on every partition.
Example: If a topic contains 8 partitions and log.retention.bytes=1GB, then each topic can retain up to 8GB of data.

  1. log.segment.bytes

The above three configurations are all applied to log fragments, not to a single message.
First, when the message arrives at the broker, it will be appended to the current log fragment on the partition. If the size of the log segment exceeds the upper limit specified by log.segment.bytes (default is 1GB), the current log segment will be closed and a new log segment will be opened.

  1. log.segment.ms

The parameter used to control the closing time of the log fragment, which specifies the time after which the log fragment will be closed.

  1. message.max.bytes

Used to limit the size of a single message, the default is 1000000, which is 1MB. If the message sent by the producer exceeds this size, the message will not be received by the broker, and the producer will receive the error message returned by the broker.


to sum up

This article outlines from several aspects:
1.Kafka related concepts.
2. A simple use and installation of Kafka.
3. Kafka's broker and topic-related configuration and analysis.
The next article will give a detailed introduction based on Kafka's producer, consumer and API level.

Guess you like

Origin blog.csdn.net/Zong_0915/article/details/109258714