3.1 What is DStream
Spark Streaming Discretized Stream is the abstract basis, representative of continuous data streams and through a variety of Spark calculation result data stream sub-operations. On the internal implementation, DStream is a series of consecutive RDD to represent. Each period RDD containing data in the interval, as follows:
the operation of data is performed in units according to the RDD
Spark Streaming DStream used to create data flow generated by the source, some of the operations may be used on existing DStream create a new DStream.
It workflow as shown in the following diagrams, the received real-time data, the data in batches, and then pass the results to generate the final Spark Engine processing the batch.