Today's Top Ten began to use big data technology

Big Data is exploding, companies from around the world every day of the emergence of a new project.

The good news is that all technologies are open source, that you can start using today.

Hadoop

Firm, enterprise strength and the basis for everything else. You need to HDFS and Hadoop YARN and infrastructure as the primary data store large data and run critical servers and applications

Today's Top Ten began to use big data technology
Spark

Easy to use, great support all important data language (Scala, Python, Java, R), a huge ecosystem, rapid growth, easy to microfilm / batch / SQL support. This is another wise choice.

In this case still have to recommend my own build Big Data learning exchange group: 529 867 072, the group is big data science development, big data if you are learning, you are welcome to join small series, we are all party software development, from time to time Share dry (only the big data-related software development), including a copy of the latest big data and advanced data advanced development course my own sort of welcome advanced and want to delve into the big data small partners to join.

unless

  • NSA tool that allows data taken from so many sources easily, stored and processed with minimal coding and flexible user interface. From social media, JMS, NoSQL, SQL, Rest / JSON Feeds, AMQP, SQS, FTP, Flume, ElasticSearch, S3, MongoDB, Splunk, Email, HBase, Hive, HDFS, Azure Event Hub, Kafka and other dozens of sources . If not you need a source or sink, then you write your own processor is straightforward Java code. Your toolbox another great Apache project. This is Switzerland ××× big data tools.

Apache Hive 2.1

Apache Hive has been a SQL solutions on Hadoop. The latest version, performance and functionality enhancements, Hive become a major SQL data solution.

Kafka

  • Select asynchronous distributed messaging system between the large data. It incorporates most of the stack. From Spark to NiFi and then to third-party tools, from Java to Scala, it is a good adhesive between the systems. This requires your stack.

Phoenix

HBase - open-source BigTable, HBase and a large number of companies committed to the sheer size of it. NoSQL supported by HDFS, and seamless integration with all the tools. Add Phoenix on HBase construction is to become the first choice of NoSQL. This adds SQL, JDBC, OLTP and operational analysis for the HBase.

Zeppelin

  • Easy to integrate notebook tool for processing Hive, Spark, SQL, Shell, Scala, Python and numerous other exploration data and machine learning tools. It is very easy to use, but also a good way to explore and query data. The tool is gaining support and functionality. They only need to upgrade their graphics and drawing.

H2O

H2O fill the gaps Spark machine learning, and working properly. It can do all machine learning you need.

Apache Beam

Unified framework for data processing pipeline in Java development. This allows you to also support Spark and Flink. Other frameworks will be online, you do not have to learn too many frameworks.

Stanford CoreNLP

Natural language processing is enormous, but grow more. Stanford University is continuing to improve their framework.

Obviously, there are a lot of big data projects, so your best bet is to start from the basis of the distribution, the distribution of the various versions contain and test projects and ensure that they are with the security and management to work together smoothly. I recommend using Hortonworks Connected Data Platforms as your foundation. If we are in the top 20, I will add more projects, especially Storm, SOLR, Apache Oozie and Apache HAWQ. Here are a lot of great technology, in most cases, you do not see or know like Apache Tez (although you need to configure it when you run the Hive), Apache Calcite, Apache Slider, Apache Zookeeper and Livy. These projects essential for running big data infrastructure.

Guess you like

Origin blog.51cto.com/14296550/2409243
Recommended