Introduction of SparkStreaming integration kafka_ Chapter 2

Kafka quick review
Core concept illustration:
Insert picture description here

Broker: The machine that installs the Kafka service is a broker
Producer: the producer of the message, responsible for writing data into the broker (push)
Consumer: the consumer of the message, responsible for pulling data from the kafka (pull), the old version of consumption You need to rely on zk, and the new version does not need
Topic: the topic, which is equivalent to a classification of data. Different topics store data for different businesses-Topic: Differentiate business
Replication: Replicas, how many copies of data (to ensure that data is not lost)- Copy: Data Security
Partition: Partition, is a physical partition, a partition is a file, a Topic can have 1 ~ n partitions, each partition has its own copy-partition: concurrent reading and writing
Consumer Group: consumers group, a topic can have multiple consumers / consumer groups at the same time, more consumers if a consumer group, then they can not be repeated consumption data - consumer groups: consumer spending increase speed, convenient unified management
Note: A Topic can be subscribed by multiple consumers or groups, and a consumer / group can also subscribe to multiple topics.
Note: Reading data can only be read from the Leader, Write data can only be written to the Leader, Follower will synchronize the data from the Leader to make a copy! ! !

Common commands


#Start kafka /export/servers/kafka/bin/kafka-server-start.sh -daemon /export/servers/kafka/config/server.properties #Stop
kafka
/ export / servers / kafka / bin / kafka-server-stop .sh #View
topic information /
export / servers / kafka / bin / kafka-topics.sh --list --zookeeper node01: 2181 #Create
topic
/export/servers/kafka/bin/kafka-topics.sh --create- -zookeeper node01: 2181 --replication-factor 3 --partitions 3 --topic test #View
information about a topic /
export / servers / kafka / bin / kafka-topics.sh --describe --zookeeper node01: 2181- topic test #Delete
topic
/export/servers/kafka/bin/kafka-topics.sh --zookeeper node01: 2181 --delete --topic test #Start
producer--the producer of the console is generally used for testing
/export/servers/kafka/bin/kafka-console-producer.sh --broker-list node01: 9092 --topic spark_kafka #start
consumer--consumer of the console is generally used for testing
/ export / servers / kafka / bin /kafka-console-consumer.sh --zookeeper node01: 2181 --topic spark_kafka--from-beginning
the address where consumers connect to the borker
/export/servers/kafka/bin/kafka-console-consumer.sh --bootstrap-server node01: 9092, node02: 9092, node03: 9092 --topic spark_kafka --from-beginning

Instructions for integrating Kafka's two modes
Insert picture description here
Insert picture description here
总结 :
Receiver receiving method
1. Multiple receivers accept data with high efficiency, but there is a risk of losing data.
2. Turn on the log (WAL) to prevent data loss, but writing data twice is inefficient.
3. Zookeeper maintains offsets and may consume data repeatedly.
4. Use a high-level API
Direct connection
1. Do not use Receiver, read data directly in the Kafka partition
2. Do not use the log (WAL) mechanism.
3. Spark maintains the offset itself
4. Use low-level API

Published 238 original articles · praised 429 · 250,000 views

Guess you like

Origin blog.csdn.net/qq_45765882/article/details/105563344
Recommended