spark action logging


1.reduce(func): The data set of each partition is first aggregated through the function func, and then the data between the partitions is aggregated. func receives two parameters, returns a new value, and then continues to pass the new value as a parameter to the function func until The last element

2.collect(): Returns all elements in the dataset to the Driver program in the form of data. To prevent the Driver program from overflowing memory, it is generally necessary to control the size of the returned dataset

3.count(): Return the number of elements in the dataset 4.first

(): return the first element

of the dataset

n elements, output in descending order by default 7.takeOrdered

(n,[ordering]): Return the first n elements in natural order or specified sorting rules

The number of , returns (the number of K, K)

9.collectAsMap(): Acting on the RDD of KV type, the function is different from collect in that the collectAsMap function does not contain duplicate keys, for duplicate keys. The latter element covers the former element

10.lookup(k): acts on the RDD of type KV and returns all V values ​​of the specified K

11.aggregate(zeroValue:U)(seqOp:(U,T) => U,comOp (U,U) => U):
The seqOp function aggregates the data of each partition into a value of type U, and the comOp function aggregates the U type data of each partition to obtain a value of type U

12.fold(zeroValue:T)(op:(T,T) => T): Aggregate the elements in each partition and merge the elements of each partition through the op function. The op function requires two parameters, the first one at the beginning The incoming parameter is zeroValue, and T is the data type of the RDD dataset. Its function is equivalent to the aggregate function with the same SeqOp and comOp functions.

13.saveAsFile(path:String): Save the final result data to the specified HDFS directory Medium

14.saveAsSequenceFile(path:String): Save the final result data to the specified HDFS directory in sequence format

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326445218&siteId=291194637