Hadoop technology parasitic open source projects under its system

Friends who are new to Hadoop technology will definitely be confused about the parasitic open source projects under its system. I can guarantee that open source technologies such as Hive, Pig, and HBase will make you a little confused. Questions about the rookie's post, when to use Hbase and when to use Hive? ....Ask ^_^ It doesn't matter. Here I will help you to clarify the principles and ideas of each technology.

Pig

A lightweight scripting language for operating hadoop, originally launched by Yahoo, but now in decline. At the beginning, after Yahoo slowly withdrew from the maintenance of pig, it contributed to the open source community for maintenance by all enthusiasts. Some companies are still using it, but I think it's better to use hive than pig. :)

Pig is a dataflow language for processing huge data quickly and easily.

Pig consists of two parts: Pig Interface, Pig Latin.

Pig can process HDFS and HBase data very conveniently. Like Hive, Pig can process what it needs to do very efficiently. By directly operating Pig queries, a lot of labor and time can be saved. Pig.

Hive

Friends who do not want to develop MapReduce in programming languages, such as DBs, and friends who are familiar with SQL can use Hive to open and offline data processing and analysis.

Note that Hive is now suitable for offline data operations, which means that it is not suitable for real-time online queries or operations in a real production environment, because a word is "slow". on the contrary

Originating from FaceBook, Hive plays the role of data warehouse in Hadoop. Built on the top layer of the Hadoop cluster, it provides a SQL-like interface to operate on the data stored in the Hadoop cluster. You can do select, join, etc. operations with HiveQL.

If you have data warehousing needs and you are good at writing SQL and don't want to write MapReduce jobs you can use Hive instead.

HBase

HBase runs on top of HDFS as a column-oriented database. HDFS lacks random read and write operations, which is why HBase appears. HBase is modeled on Google BigTable and stored in the form of key-value pairs. The goal of the project is to quickly locate and access the desired data among the billions of rows of data in the host.

HBase is a database, a NoSql database. Like other databases, it provides random read and write functions. Hadoop cannot meet real-time needs, but HBase can meet it. If you need to access some data in real time, store it in HBase.

You can use Hadoop as a static data warehouse and HBase as a data store for data that will be changed by some operations.

Pig VS Hive

Hive is more suitable for data warehousing tasks. Hive is mainly used for static structures and work that requires frequent analysis. Hive's similarity to SQL makes it an ideal intersection of Hadoop and other BI tools.

Pig gives developers more flexibility in the field of large data sets and allows the development of concise scripts for transforming data streams for embedding into larger applications.

Pig is relatively lightweight compared to Hive, and its main advantage is that it can greatly reduce the amount of code compared to directly using Hadoop Java APIs. Because of this, Pig still attracts a large number of software developers.

Both Hive and Pig can be used in combination with HBase. Hive and Pig also provide high-level language support for HBase, making it very simple to perform data statistical processing on HBase.

Hive VS HBase

Hive is a batch system built on top of Hadoop to reduce the work of writing MapReduce jobs, and HBase is to support projects that make up for Hadoop's shortcomings in real-time operations.

Imagine you are operating an RMDB database, if it is a full table scan, use Hive+Hadoop, if it is an index access, use HBase+Hadoop.

Hive query means that MapReduce jobs can last from 5 minutes to several hours. HBase is very efficient, definitely much more efficient than Hive.

This article comes from the Linux commune website ( www.linuxidc.com ) original link: http://www.linuxidc.com/Linux/2014-03/98978.htm

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326869556&siteId=291194637