Preparations spark streaming kafka data sources

kafka is a high-throughput distributed publish-subscribe messaging system, to meet the real-time processing and batch offline processing
as hub for information transmission, kafka to external data sources and hadoop ecosystem where components interact directly compiled kafka can put all external data sources hadoop components are to interact in
kafka components
kafka cluster contains many server, which is called Broker
Topic: each message will be sent to the topic, the subscription message also read a message from each topic, a topic data can be classified into many partitions , you can save on many servers, as long as we care about topic on the line, do not need to be concerned about where to go save the
partition: each topic will include a number of partition partition
Producer: responsible for publishing data
Consumer: is responsible for reading data (spark streaming ) ,, each consumer will belong to a Group consumer
Zookeeper: Kafka used to live must rely on the run Zookeeper, the above component registration information is stored in Zookeeper in
kafka - 2.11-0.8.2.2tgz previous version 2.11 is supported scala, followed by itself The version number
starts Kafka used to live
# start ZooKeeper
cd / U SR / local / Kafka
./bin/zookeeper-server-start.sh
config / zookeeper.properties
# above the terminal can not be closed, an zookeeper was shut, a new terminal in
cd / usr / local / kafka
bin / kafka-server-start.sh config / the server.properties
# and then open a second new terminal, built to detect what kafka yet
cd / usr / local / kafka
./bin/kafka-topics.sh - --zookeeper localhost the Create: 2181 --replication factor-1
-partitions 1 --topic wordsendertest
# a copy of a partition topic name is wordersendertest
./bin/kafka-topics.sh --list --zookeepertest localost: 2181 # List all topics, from the above that if wordersenttest on the right
# to open the new third terminal create producer
.bin / kafka-console-producer.sh --broker-List LocalList: 9092 --topic wordsendertest
-topic wordsendertest # open producer page, write data inside the line
# to open the fourth terminal to create a new consumer
cd / usr / local / Kafka used to live
./bin/kafka-console-consumer.sh --zookeeper localhost: 2181 --topic wordertest --from-begining
# To accept data wordersendertest this topic, representatives resumed later, add the data before this parameter will create a consumer could get over this parameter is not only to create a good future to take over consumer data
. This has an edition problem free update next, anyway, given the results of Baidu, there is
free time updates, the new version of Dir consumers create four terminals
./bin/kafka-console-consumer.sh --bootstrap-server localhost: 9092 - wordertest begining of the --from--topic
spark was to be used kafka download jar package, unzip the jars under the spark in the kafka, the attention version of the problem, must be aligned, then install kafka copy all the jar packages in the libs directory to spark / jars / kafka
modify configuration files spark
cd spark / conf
vim spark-env.sh add environment variables

Published 25 original articles · won praise 0 · Views 371

Guess you like

Origin blog.csdn.net/qq_45371603/article/details/104638321