Programmers want to engage in big data, you must master the 10 skills!

Springing up with to describe the new project every day from around the world, can not be overemphasized, especially those related to big data. One moves, no more technology as a support, programmers can not keep up with the pace of careful oh. Here's to count 10 prehistoric open-source Big Data technologies, to organize your portrait!

1.Apache Beam

ApacheBeam provides unified data pipeline process developed in Java, and can support Spark and Flink. Moreover, it provides a lot of online framework, save a lot of time and energy developers learning framework.



Internet technology development booming, the era of artificial intelligence, grabbed the next outlet. To help those who want to switch the direction of the Internet want to learn, but because of lack of time, lack of resources and give up. I am finishing a new big data and advanced data Advanced Development Guide, Big Data learning group: plus 199 plus 210 [427] Finally, organizational learning can be found in Advanced welcome and want to delve into the small of big data partners to join.

640?wx_fmt=other&tp=webp&wxfrom=5&wx_lazy=1&wx_co=1

2.Apache Hive2.1

Hive is based on Hadoop data warehouse infrastructure. With the latest version of the release, ApacheHive performance and functionality have been enhanced, it has become the best solution for SQL on big data. It provides a range of tools that can be used for data extraction transformation loading (ETL) - This is a store, query and analyze mechanisms of large-scale data stored in Hadoop.

3.Hadoop

Efficient, reliable, scalable Hadoop-- able to provide the required YARN, HDFS and infrastructure projects for your data storage, and running large primary data services and applications.

640?wx_fmt=other&tp=webp&wxfrom=5&wx_lazy=1&wx_co=1

4.Kafka

Kafka is a high throughput of distributed publish-subscribe messaging system that can handle all the action streaming data consumer scale website. From Spark to NiFi and then to third-party plug-in tools that Java to Scala, Kafka provides strong adhesion, it has become the best choice for large data systems and distributed between asynchronous messages.

5.NiFi

堪称大数据工具箱里的瑞士×××的ApacheNiFi,是由美国国家安全局(NSA)贡献给Apache基金会的开源项目,其设计目标是自动化系统间的数据流。其中,它的两个最重要的特性是强大的用户界面,以及良好的数据回溯工具。基于其工作流式的编程理念,NiFi非常易于使用,而且强大、可靠、高可配置。

6.Phoenix

作为HBase的SQL驱动,Phoenix目前被大量的公司采用,它正在逐渐扩大规模。HDFS支持的NoSQL能够很好地集成所有工具,Phoenix查询引擎会将SQL查询转换为一个或多个HBasescan,并编排执行以生成标准的JDBC结果集。

640?wx_fmt=other&tp=webp&wxfrom=5&wx_lazy=1&wx_co=1

7.Spark

Spark使用简单,而且可以支持所有重要的大数据语言,如Scala、Python、Java、R等。同时,它还拥有强大的生态系统,且成长迅速,对microbatching/batching/SQL的支持也很简单。最重要的是,Spark能更好地适用于数据挖掘与机器学习等需要迭代的MapReduce的算法。

8.Sparkling Water

H2O填补了Spark’sMachineLearning的缺口,SparklingWater可以满足你所有的机器学习。

9.Stanford Core NLP

自然语言处理拥有巨大的增长空间,斯坦福正在努力增进他们的框架,StanfordCoreNLP横空出世。

10.Zeppelin

Zeppelin是一个提供交互数据分析且基于Web的笔记本,方便用户做出可数据驱动的、可交互且可协作的精美文档。同时,它还支持多种语言,包括Scala(使用ApacheSpark)、Python(ApacheSpark)、SparkSQL、Hive、Markdown、Shell,等等。

在科技圈都知道,作为当今最热门的技术之一的大数据,正呈爆炸式增长。幸运的是,开源让越来越多的项目可以直接采用大数据技术,这也为程序员提供了多一条出路。


Future prospects of big data can be expected, the line of people very much, and how fast to complete the transition, how quickly enter the field of big data, we need to transition who were white to deep thinking.



There are white point for learning should be noted that a lot of big data, but in any case, since you have chosen to enter the Big Data industry, it will only struggle and hardships. Not forgetting the so-called beginner's mind, was always square, big data or learning a persistent heart you need it most.


Guess you like

Origin blog.51cto.com/13854477/2406329