Create a file stream in pyspark then wordcount

CD / usr / local / Spark / MyCode
mkdir Streaming
CD Streaming
mkdir logfile
CD logfile
start pyspark
from pyspark Import SparkContext
from pyspark.streaming Import StreamingContext
SSC = StreamingContext (SC, 10) # a message every 10 seconds
lines = ssc.textFileStream ( ' File: /// usr / local / Spark / MyCode / Streaming / logfile ')
words = lines.flatMap (the lambda X: x.split (' ''))
wordscount = words.map (the lambda X: (X,. 1) ) .reduceByKey (the lambda A, B: A + B)
wordscount.pprint ()
ssc.start () # automatically start the program enters a loop listening mode, may be displayed in the word frequency statistics listening window
ssc.awaitTermination ()

Published 25 original articles · won praise 0 · Views 376

Guess you like

Origin blog.csdn.net/qq_45371603/article/details/104615345