Recognized as the best big data processing tools hadoop

Architecture big data solutions, software engineers are aware, there is a technical business analysis across SQL databases, NoSQL databases, unstructured data, document-oriented large-scale data storage and processing. If you guessed Hadoop, you answered correctly. Hadoop also home to many giant companies have one thing in common, such as Amazon, Yahoo, AOL, Netflix, eBay, Microsoft, Google, Twitter and Facebook. IBM often even walk in the forefront of promoting the Hadoop for enterprise analysis. This open source model everywhere, it stays in this arena five years, is a real character, we have to do this surprised.

 

Hadoop's future

 

In order to understand what happened in the past few years, we visited Chuck Lam, author "Hadoop in Action (Hadoop in Action)" book. Chuck says Hadoop has not stopped to rest. "The whole ecosystem is indeed evolved, and changed a lot and now even the official version 1.0. More importantly, MapReduce's programming model has been revised basis, and done a lot of change." In general, these changes both toward favorable aspects of development. Development direction has made this framework is easy to deploy in the enterprise, and solve a series of problems, such as risk aversion for companies is the first issue of security issues.

 

More and more benefits, including a high level of scalability. This framework of distributed computing means adding more and more data, without having to change the way it added. No need to change the format, edit or disrupt work or decide which way to apply this work done. You just work with the addition of more nodes can be. You do not criticize your type of data stored or its source. No pattern is the name of this game. More parallel computing power of the frame is also stored in the server whether the commodity utilization. This means that companies can save, use more data. Regardless of which node fails, it is all right. Even if the system fails, it will not lose data, reduce performance.

 

Power Hadoop technology

 

Hadoop is now also more flexible, allowing the business to do things better, and handle more data types. So powerful Hadoop project from many peers, including languages ​​such as Pig, as well as a scalable solution as follows:

 

1, Hive (data warehouse)

 

2, Mahout (data mining and machine learning)

 

3, HBase (structured storage of large tables)

 

4、Cassandra (多主机数据库)

 

当然,此类型的解决方案并不一直都是美好好。Lam说主要的陷阱就是处理做出的假设。换言之,错不在我们的系统而在我们自己。“新技术并不是所有问题的灵丹妙药。正如NoSQL这类的一样简单,但你必须要更深一层地弄清楚你要解决的问题。”这可能意味着慎重地查看你的算法,而不是只是把你的员工扔给 MapReduce,然后期望Hadoop自动扩展。使用模式的数据会影响你的扩展模式——尤其是当使用不平均是。然后线性扩展可能就不起作用了。再一次,这个并不是Hadoop本身的问题。Lam相信有工具在手的企业已经足够成熟了。这只是确保IT管理员熟悉这些工具,确保使用Hadoop的软件架构师知道怎样更有效地使用用这项技术。

 

作者强力推荐阅读文章:

大数据工程师必须掌握开源工具汇总

大数据高级工程师教你如何读懂大数据核心技术

顶级大数据工程师需要掌握的技能

大数据、机器学习和人工智能未来发展的8个因素

Guess you like

Origin blog.csdn.net/sdddddddddddg/article/details/91357805