Difference between Spark and MapReduce

performance:

Spark processes data in memory, while MapReduce processes data in disk through map and reduce operations. So from this aspect, the performance of Spark is better than that of MapReduce. However, when the amount of data is relatively large and cannot be read into memory, MapReduce has an advantage. When it comes to repeatedly reading the same data for iterative computation, Spark has an advantage; but when it comes to a single read, similar to ETL operation tasks, MapReduce is suitable for processing.

 

Fault tolerance:

When execution fails halfway through, MapReduce continues execution from where it failed because it is hard drive dependent. But Spark has to execute from scratch, so MapReduce relatively saves time.

 

Application scenarios:

MapReduce mainly performs offline computing processing, computing some existing data, such as analyzing existing orders or logs. Spark can be used in some real-time query and iterative analysis scenarios, such as recommendation systems.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324981758&siteId=291194637