Performance comparison of Hadoop and spark

What is the difference between Hadoop and spark performance.

If Hadoop is a large contracting team, we can use it to organize personnel to cooperate, moving bricks to build houses, malpractice is slower.

Spark is another contracting team, set up late, but they are moving bricks more flexible, real-time interaction can build a house, work efficiency is much faster than Hadoop.

When Hadoop start the upgrade, specify scheduling expert YARN scheduling workers. Spark moving bricks (HDFS, Cassandra, S3, HBase) from a plurality of warehouses, also allow different experts as YARN / MESOS personnel and scheduling tasks.

Of course, Spark and Hadoop team cooperation, the problem becomes more complex. As two separate contracting team, both have their own advantages and disadvantages and the specific business use cases.

Therefore, we say that the performance difference between Hadoop and spark that:

Spark running in memory Hadoop 100 times faster than the speed, the faster the speed of the disk 10 times. As we all know, Spark on the machine is only one number is, the speed of 100TB sort data faster than Hadoop MapReduce 3 times. Further, the Spark speed machine learning application also faster, for example, Naive Bayes and k-means.

Spark reason why the performance is better than Hadoop, because each time you run MapReduce tasks, Spark will not be limited by input and output. It turns out that the speed of the application much faster. Further Spark DAG can be optimized in between each step. Hadoop MapReduce between step without any periodic connection, which means that the performance of adjustment does not occur at that level. If, however, Spark and other shared services running on YARN, performance may degrade and cause a memory leak RAM overhead. For this reason, if the user has a batch of appeals, Hadoop is considered to be a more efficient system.

Guess you like

Origin blog.csdn.net/kangshufu/article/details/92431496