Spark become the new core areas of big data Five Reasons

In the past few years, with the Hadoop Solutions has gradually become the dominant large data processing field, there is a lot of controversy originally began settled. First of all, Hadoop distributed file system is correctly handle large data storage platform. Secondly, YARN is the ideal resource allocation and management framework options for large data environments. The third and most important point, no single set of processing framework can solve all the problems. While MapReduce is indeed a remarkable technical achievements, but still not enough to be one hundred test Braun cure.

 

 

Relies on Hadoop businesses need the help of a series of analytical infrastructure and processes to find all kinds of critical issues related to the conclusions and answers. Enterprise customers need data preparation, descriptive analysis, more advanced search features, predictive analytics and machine learning with graphics processing. At the same time, companies also need a set of tools to meet their actual needs, allowing them to make full use of all types now have the skills and other resources. For now, there is no single standardized process framework which is sufficient to provide such an effect. From this perspective, the advantages of Spark's just been perfect.

Although Spark is just the relatively young project data, but it can meet all the needs mentioned above, even more can be done. In today's article, we will list the top five reasons to prove why the Spark led by the era has arrived.

1. Spark allow advanced analysis by the vision into reality

Although most large innovative companies are working to expand its advanced analysis capabilities, but in a big data analytics recent meeting in New York, only 20% of the participants said there are internal corporate deployment of advanced analytics solutions. The other 80% of the participants to reflect its still only have a simple basic data preparation and analysis capabilities. In these enterprises, only a few data scientists will start a lot of time to implement and manage a descriptive analysis of the mechanism.

Spark项目提供的框架能够让高级分析的开箱即用目标成为现实。这套框架当中包含众多工具,例如查询加速、机器学习库、图形处理引擎以及流分析引擎等等。对于企业而言,即使拥有极为杰出的数据科学家人才(当然这一前提同样很难实现),他们也几乎不可能通过MapReduce实现上述分析目标。除此之外,Spark还提供易于使用且速度惊人的预置库。在此基础之上,数据科学家们将被解放出来,从而将主要精力集中在数据准备及质量控制之外的、更为关键的事务身上。有了Spark的协助,他们甚至能够确保对分析结果做出正确的解释。

2. Spark让一切更为简便

长久以来,Hadoop面临的最大难题就是使用难度过高,企业甚至很难找到有能力打理Hadoop的人才。虽然随着新版本的不断出炉,如今Hadoop在便捷性与功能水平方面已经得到了长足进步,但针对难度的诟病之声依然不绝于耳。相较于强制要求用户了解一系列高复杂性知识背景,例如Java与MapReduce编程模式,Spark项目则在设计思路上保证了每一位了解数据库及一定程度脚本技能(使用Python或者Scala语言)的用户都能够轻松上手。在这种情况下,企业能够更顺畅地找到有能力理解其数据以及相关处理工具的招聘对象。此外,供应商还能够快速为其开发出分析解决方案,并在短时间内将创新型成果交付至客户手中。

3. Spark提供多种语言选项

在讨论这一话题时,我们不禁要问:如果SQL事实上并不存在,那么我们是否会为了应对大数据分析挑战而发明SQL这样一种语言?答案恐怕是否定的——至少不会仅仅只发明SQL。我们当然希望能够根据具体问题的不同而拥有更多更为灵活的选项,通过多种角度实现数据整理与检索,并以更为高效的方式将数据移动到分析框架当中。Spark就抛开了一切以SQL为中心的僵化思路,将通往数据宝库的大门向最快、最精致的分析手段敞开,这种不畏数据与业务挑战的解决思路确实值得赞赏。

4. Spark加快结果整理速度

随着业务发展步伐的不断加快,企业对于实时分析结果的需要也变得愈发迫切。Spark项目提供的并发内存内处理机制能够带来数倍于其它采用磁盘访问方式的解决方案的结果交付速度。传统方案带来的高延迟水平会严重拖慢增量分析及业务流程的处理速度,并使以此为基础的运营活动难于开展。随着更多供应商开始利用Spark构建应用程序,分析任务流程的执行效率将得到极大提高。分析结果的快速交付意味着分析人士能够反复验证自己的论断,给出更为精确且完整的答案。总而言之,Spark项目让分析师们将精力集中在核心工作上:更快更好地为难题找出解答。

5. Spark对于Hadoop供应商选择不设硬性要求

目前各大Hadoop发行版本都能够支持Spark,其理由也非常充分。Spark是一套中立性解决方案,即不会将用户绑定到任何一家供应商身上。由于Spark属于开源项目,因此企业客户能够分析地构建Spark分析基础设施而不必担心其是否会受到某些Hadoop供应商在特定发展思路方面的挟持。如果客户决定转移平台,其分析数据也能够顺利实现迁移。

Spark项目蕴含着巨大的能量,而且已经在短时间内经受住了考验、证明其有能力密切匹配大数据分析业务的实际要求。目前我们所迎来的还仅仅是“Spark时代”的开端。随着企业越来越多地发挥Spark项目中的潜能,我们将逐步见证Spark在任意大数据分析环境下巩固其核心技术地位,围绕其建立起的生态系统也将继续茁壮成长。如果企业客户希望认真考量高级实时分析技术的可行性,那么将Spark引入自身大数据集几乎已经成为一种必然。

推荐阅读

年薪40+W的大数据开发【教程】,都在这儿!

大数据技术盘点

程序员大数据培训分享Shell中数组讲解

大数据教程:SparkShell和IDEA中编写Spark程序

大数据零基础快速入门教程

Java基础教程

web前端开发基础教程

Guess you like

Origin blog.csdn.net/yuyuy0145/article/details/92425716