Kafka + pyspark

Installation kafka

three parts server producer consumer kafka

pyspark monitoring

First, the deployment environment

1. Import corresponding version of the spark-streaming-kafka - * -. * Jar

2. The jar is added to the respective SPARK_DIST_CLASSPATH

Two, kafka + spark test

1. Start the server and producer kafka

2. Code

from pyspark.streaming.kafka import KafkaUtils

if __name__ == "__main__":
if len(sys.argv) != 3:
print("Usage: kafka_wordcount.py <zk> <topic>", file=sys.stderr)
exit(-1)

sc = SparkContext(appName="PythonStreamingKafkaWordCount")
ssc = StreamingContext(sc, 1)

zkQuorum, topic = sys.argv[1:]
kvs = KafkaUtils.createStream(ssc, zkQuorum, "spark-streaming-consumer", {topic: 1})
lines = kvs.map(lambda x: x[1])
counts = lines.flatMap(lambda line: line.split(" ")) \
.map(lambda word: (word, 1)) \
.reduceByKey(lambda a, b: a+b)
counts.pprint()

ssc.start()
ssc.awaitTermination()

3. Start producers to start monitoring instantly calculate the number of word frequency

4. Note that each version mismatch problems

Guess you like

Origin www.cnblogs.com/xiennnnn/p/11609163.html