Python PySpark toLocalIterator() function

pyspark.RDD.toLocalIterator ()

RDD.toLocalIterator(prefetchPartitions=False)

It is a method of RDD in PySpark.
Returns an iterator containing all the elements in the RDD.
This iterator consumes as much memory as the memory of the largest partition in this RDD.
If you choose preselection, which is prefetchPartitionsset to True, it may consume up to the two largest partitions of memory.
With this function, you can easily convert the data in the RDD into an iterator for easy traversal operations.

parameter:

Parameter name: prefetchPartitions
Parameter type: bool type, the default is False.
Parameter mandatory: optional
Whether Spark needs to get the next partition in advance when needed

E.g:

rdd = sc.parallelize(range(10))
[x for x in rdd.toLocalIterator()]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

Guess you like

Origin blog.csdn.net/weixin_42072754/article/details/115122881