And the strengths and weaknesses of various concepts related to Hadoop

Hadoop advantage

Here Insert Picture Description

hdfs defined

Here Insert Picture Description

hdfs advantage

Here Insert Picture DescriptionHere Insert Picture Description

What is the hive

Hive: Facebook open source for the statistics to solve the massive structure of the log.
Hive is based on Hadoop data warehousing tools, you can map the structure of the data file to a table, and provides SQL-like query.
Essentially: HQL converted to the MapReduce procedure
1) in the data storage processing Hive the HDFS
2 Hive analytical data underlying implementation) is MapReduce
3) executing a program run Yarn

Hive advantages and disadvantages

Advantages
1) based user interface using SQL syntax, the ability to provide rapid development (simple, easy to use).
2) avoiding to write MapReduce, reduce learning costs for developers.
. 3) is performed Hive delay is relatively high, so Hive commonly used in the data analysis, less demanding real-time applications.
4) Hive advantage of big data processing, for processing data no small advantage, because Hive execution delay is relatively high.
5) Hive support for user-defined functions, users can implement your own functions according to their needs.
Shortcomings
1. HQL limited ability to express the Hive
(1) iterative algorithm can not express
(2) data mining is not good, due to MapReduce data processing flow, the higher the efficiency of the algorithm can not achieve.
2. Hive efficiency is relatively low
(. 1) Hive MapReduce jobs automatically generated, usually enough intelligence
(2) Hive tuning more difficult, coarser

mapreduce defined

Here Insert Picture Description

mapreduce advantages and disadvantages

Here Insert Picture DescriptionHere Insert Picture DescriptionHere Insert Picture Description

flume defined

Flume Cloudera is provided to a highly available, highly reliable, distributed massive log collection, aggregation and transmission systems. Flume flow-based architecture, flexible and simple.

flume advantage

1.2 Flume advantage of
1. The process may be an arbitrary storage and integration.
2. The input data rate is greater than the write rate storage purposes, Flume are buffered, the reduced pressure hdfs.
3.flume the transaction based channel, uses two transaction model (sender + receiver), ensure that the message is reliably transmitted.

zookeeper defined

Zookeeper is an open source distributed coordination services shall stand, mainly used to solve the consistency problem of distributed application systems in the cluster, such as operating between the same data caused by dirty read how to avoid the problem when the network, in essence, is a distributed ZooKee area small file storage system, similar to the way the tree-based file system, data storage, and can be effective management of nodes in the tree, which is used to maintain and monitor the status of your change of stored data by monitoring the data state the changes, which can be achieved based cluster management data, such as: a unified naming services, distributed configuration management, distributed message queues, distributed lock, distributed coordination function.

zookeeper Features

Here Insert Picture Description

Published 18 original articles · won praise 2 · Views 380

Guess you like

Origin blog.csdn.net/CH_Axiaobai/article/details/103423809