MR and SPARK ON YARN difference

  • Multi-process model MapReduce : MapReduce application is for each Task dynamic application resources, and run immediately after the release of resources

 

    • Each Task run in a separate JVM process
    • Can set different amounts for different types of resources alone Task, memory and CPU currently supports two kinds of resources
    • Each Task has run, will release the occupied resources, these resources can not be other Task reuse, even for the same type of the same job Task. That is, each Task go through a process "application resources -> release resources -> Run Task" of

 

  • Spark multi-threading model : The first step is to build a reusable resource pool, and then run all ShuffleMapTask ReduceTask in this resource pool

    • Each node can run on one or more Executor Services

    • Each Executor with a certain number of slot, indicating that the Executor can run or how many ShuffleMapTask ReduceTask

    • Each Executor runs in a single JVM process, each Task is a running thread in the Executor

    • Task Executor of the same internal memory can be shared, broadcast file or data structure will only be loaded once in each of the Executor, and not as MapReduce as each Task loaded once

    • Executor once started, it will continue to run and its resources have been multiplexed Task, after completion of the program runs until the release of exit Spark

Guess you like

Origin www.cnblogs.com/xiangyuguan/p/11353169.html