First, let's explain what is Hadoop.
Hadoop has two core components, one for HDFS, so-called HDFS, is a distributed file storage system.
Two for Mapreduce, that is a distributed computing system (distributed computing frameworks offline).
The above two components, to solve the problem of large data storage, there was computed a large data.
The remaining two are basically derived from the tool.
Maperduce programming languages:
1, Jave (the most primitive way)
2, Hadoop Streaming (support multiple languages)
3, Hadoop Pipes (for C and C ++)
Mahout algorithm provides: classification, clustering, frequent pattern mining, vector similarity calculation, recommendation engines, dimension reduction, evolutionary algorithms, regression analysis, etc.
Hive: data warehouse is built on top of Hadoop for massive statistical problem solving unstructured log data, the structure of language HQL, similar to SQL, but not identical.
Pig: Hadoop-based data flow execution engine using MapReduce parallel processing of data, using Pig Latin language data stream
Hive: That is Hive2 (Stinger), is replaced by the underlying algorithm engine Tez (DGA calculated frame) the MapReduce
Impala: processing data can be stored directly on the HDFS, and writes data to HDFS times, with good scalability and fault tolerance, for fast interactive query.
Oozie: