DataSource
-
Based on the set
fromCollection(Collection)
-
Based on documents
readTextFile(path)
Transformation
-
Map
-
FlatMap
-
MapPartition: once a partition of data processing
-
Filter
-
Reduce
-
Aggregations
-
Distinct: Returns the data set element of a deduplication
-
Join
-
OuterJoin
-
Cross
-
Union
-
First-n: acquiring first n elements of the collection
-
Sort Partition: Sort all partitions
-
Rebalance:
-
Hash-Partition: the hash value of the specified key data set partition
partitionByHash()
-
Range-Partition: range-partitioned data set according to the specified key
.partitionByRange
-
Custom Partition
partitionCustom(partitioner, "someKey")
partitionCustom(partitioner, 0)
Sink
- writeAsText()
- writeAsCsv()
- print()