Article Directory
- Transformation of the operator 1. Spark
- 1.1. map
- 1.2. lookup
- 1.3. mapPartitions
- 1.4. flatMap
- 1.5. mapPartitionsWithIndex
- 1.6. mapPartitionsWithContext
- 1.7. combineByKey [pair]
- 1.8. reduceByKey
- 1.9. groupByKey
- 1.10. aggregateByKey
- 2. Spark operator in Action
- 3. [Spark operator blog link] (http://homepage.cs.latrobe.edu.au/zhe/ZhenHeSparkRDDAPIExamples.html)
- 4. [Flink DataStream Transformations 算子](https://ci.apache.org/projects/flink/flink-docs-release-1.10/zh/dev/stream/operators/)
Transformation of the operator 1. Spark
1.1. map
- Each element will be called once the map method
- A receiving function, this function is used for each element in the RDD, the function returns a result as a result of the corresponding elements in RDD
1.2. lookup
- for lookup (K, V) RDD type, designated K value V returns all of the K values corresponding to RDD
1.3. mapPartitions
- Each partition method called once mapPartitions
- Return a new RDD by applying a function to each partition of this RDD
- Details links
1.4. flatMap
- This probably means that the f function to Seq in all elements, and a set of functions generated in the elements taken out to form a new collection. And then return to this new collection
- Details links
1.5. mapPartitionsWithIndex
-And mapPartitions function consistent, and be able to obtain the index number of the partition
1.6. mapPartitionsWithContext
-And mapPartitions same function and parameter information with context
1.7. combineByKey [pair]
-combineByKey () is the most common type of key-value rdd gather operation aggregating function (aggregation function). Similar aggregate (), combineByKey () allows the user to return the type of the input value is inconsistent.
1.8. reduceByKey
-reduceByKey underlayer is achieved by combineByKeyWithClassTag
-The first parameter is the default combineByKeyWithClassTag (v: V) => v, so it will not have any effect on the elements
-Second and third two parameters are the same, is passed over reduceByKey the two value becomes a value (V, V) => V
1.9. groupByKey
-groupByKey underlayer is achieved by combineByKeyWithClassTag
-groupByKey return value RDD [(K, Iterable [V])], val value is an iterator that content includes all values value tuples of the key value K
-The implementation process is similar reduceByKey, but already written for you each function, but the parameters mapSideCombine = false, which means, not the end of the map are performed in the end reduce
1.10. aggregateByKey
- Aggregate aggregateByKey and similar, the polymerization is carried out twice, except that the latter is only valid for the partition, the partition of the former key is further subdivided, also called the bottom combineByKey
- Details links
2. Spark operator in Action
2.1. aggregate
- Consistent aggregate functions, input and output types
- Details links
注意:
val mergeResult = (index: Int, taskResult: U) => jobResult = combOp(jobResult, taskResult)
- val mergeResult = (index: Int, taskResult: U) =>
Unit
{jobResult = combOp (jobResult, taskResult)}