Write flow calculation program

#!/usr/bin/env python3

from__future__import print_function
import sys
from pyspark import SparkContext
from pyspark.streaming import StreamingContext
from pyspark.streaming.kafka import KafkaUtils

__ = if__name " main ":
IF len (the sys.argv) =. 3:!
Print ( "the Usage: KafkaWordCount.py", File = sys.stderror)
Exit (-1)
SC = SparkContext (appName = "PythonStreamingKafkaWordCount")
SSC = StreamingContext (SC,. 1)
zkQuorum, the sys.argv topic = [. 1:] # which is passed in linux operating parameter, a first default self, is zookeeper second address, the third is the topic name
kvs = KafkaUtils .createStream (ssc, zkQuorum, "spark -streaming-consumer", {topic, 1}) # Construction input source, the third parameter is a Group Group consumers, and finally the name and the number of topic partition
lines = kvs .map (X the lambda: X [. 1])
Counts = lines.flatMap. (the lambda X: x.split ( "")) Map (the lambda Y: (Y,. 1)) reduceByKey (A the lambda, B:. + A B)
counts.pprint ()
ssc.start ()
ssc.awaitTermination ()

New terminals to calculate the flow of
the first step into said code directory
CD / usr / local / Spark / MyCode / Streaming / Kafka
/ usr / local / Spark / bin / Spark-Submit KafkaWordCount.py localhost: 2181 wordsendertest first parameter # a zookeeper server address, the second is the name of the theme of the subscription
and then knock the source terminal data word will be figured out

Published 25 original articles · won praise 0 · Views 370

Guess you like

Origin blog.csdn.net/qq_45371603/article/details/104653591