The difference between spark and spark framework and MR

2019-12-11

Spark the framework

Three core components: SparkCore SparkSQL SparkStreaming

Spark has three deployment models: Stanalone Yarn Messos

 

 

***** difference between Spark and MapReduce

 

1.Spark calculation data into the memory, the iterative computational efficiency will be higher; MR intermediate results need to ground disk, a lot of disk IO operations can affect the performance

2.Spark high fault tolerance, it is achieved by an elastic high fault tolerance distribution data set RDD, RDD is a set of distributed nodes exist in the read-only memory data, these sets are elastic, a portion of lost or erroneous data, through the entire calculating blood flow data set to implement reconstruction; need to recalculate the MR fault tolerance, and high cost.

3.Spark more general, transformation provides the Spark and the API multifunctional action in these two categories, in addition to streaming SparkStreaming module, machine learning, a calculation; Map and Reduce only the MR method, no other modules, MR fact there are basically no one machine learning to use.

Eco 4.Spark richer frame, first by RDD, the Lineage blood, when executed has directed acyclic graph DAG, Stage division, etc., often need to run jobs on Spark different scenarios, this time can be adjusted according to different scenarios preferably; the MR computing framework is relatively simple, relatively weak performance, single-stable operation for a long time to run in the background.

Guess you like

Origin www.cnblogs.com/yumengfei/p/12024873.html