B05 - 002, Hadoop acquaintance

0, this chapter outlines the learning catalog - Hadoop acquaintance

Beginner consuming: 0.5h

Note: CSDN end of the phone does not support chapter jumps within the chain, but the chain is available, also requested a better experience on the PC side.

A, Hadoop describes
  what 1.1 Hadoop that?
  1.2 Hadoop version.
  Hadoop 1.3 on the narrow sense - core components.
  Hadoop 1.4 on generalized - ecosystem.

Second, the brief history of the development of Hadoop

Three, Hadoop characteristic advantages
  3.1 expansion capability (the Scalable)
  3.2 Low cost (as Economical)
  3.3 High Efficiency (Efficient)
  3.4 Reliability (Rellable)

Four, Hadoop application at home and abroad
    4.1 abroad.
    4.2 Domestic.



Beverage giant comfort zone Akzo  ||  ♂ ♀ tired feel no love





A, Hadoop Introduction

  1.1 ~ Hadoop What is that?

  • Apache Hadoop is an open source software framework's implementation using java language, is a development and operation of large-scale data processing software platform.
  • It allows the use of a simple programming model for distributed processing of large data sets on a large number of computers in the cluster.

  1.2 ~ Hadoop version.

alt

Hadoop3.0 version yet mature ...

  Hadoop on 1.3 to narrowly - core components.

    1.3.1 HDFS -. Distributed File System.
  • Solve mass data storage.
    . 1.3.2 YARN - frame job scheduling and cluster resource management.
  • Resolve resource scheduling.
    . 1.3.3 MAPREDUCE - distributed computing programming framework.
  • Solve massive data computing.

  1.4 ~ Hadoop in a broad sense - ecosystem.

alt

Moment Hadoop has grown into a huge system, with the growth of the ecosystem, emerging more and more projects, including some non-Apache in charge of projects, which is a good complement to HADOOP or higher level of abstraction .
...
For example:

    1.4.1 HDFS -. Distributed File System.
  • Solve mass data storage.
    . 1.4.2 MAPREDUCE - distributed computing application development framework.
  • Solve massive data computing.
    . 1.4.3 YARN - frame job scheduling and cluster resource management.
  • Resolve resource scheduling.
    1.4.4 HIVE -. HADOOP is based on distributed data warehouse.
  • SQL-based query data manipulation.
    . 1.4.5 HBASE - based HADOOP massive distributed database.
    1.4.6 ZOOKEEPER -. Distributed Coordination service infrastructure components.
    1.4.7 Mahout -. Based on machine learning algorithm library mapreduce / spark / flink and other distributed computing framework.
    1.4.8 Oozie -. Workflow scheduling frame.
    1.4.9 Sqoop:. Data import and export tools.
  • For example between mysql and HDFS.
    . 1.4.10 Flume - log data acquisition frame.
    1.4.11 Impala -. Hadoop-based real-time analysis.
    ......


Determine the direction of development, adhere to business accumulation.

- - - - - - - - - - - - - - - - - - - - - - - - - - - -


Second, the brief history of the development of Hadoop

  • Apache Lucene Hadoop founder Doug Cutting is created.
  • Originated in Nutch, Lucene subproject it is.
  • Nutch design goal is to build a large-scale network-wide search engines, including web crawling, indexing, query and other functions, but with the increase in the number of pages crawled, encountered serious scalability issues: how to solve the billions of web pages storage and indexing issues.
  • In 2003 Google published a paper providing a viable solution to this problem.
  • Paper describes Google's product architecture, which is called: Google Distributed File System (GFS), can solve the storage needs of large files they generate on the page crawling and indexing process.
  • In 2004 Google introduced the world to the published version of the Google MapReduce system.
  • Same period, Nutch developers to complete the corresponding open source implementation of HDFS and MAPREDUCE, and stripped become independent from Nutch project HADOOP, to January 2008, HADOOP become a top-level Apache project, is celebrating its rapid development.
  • In 2006 Google published a thesis on BigTable, which prompted the development of Hbase later.
  • Therefore, the development of Hadoop and its ecosystem can not do without Google's contribution.


Determine the direction of development, adhere to business accumulation.

- - - - - - - - - - - - - - - - - - - - - - - - - - - -




Three, Hadoop characteristic advantages

  3.1 to capacity expansion (Scalable)

  • Hadoop is allocated among the data available in the computer cluster and complete computing tasks, these clusters can be used to facilitate the expansion of thousands of nodes.

  3.2 ~ low cost (as Economical)

  • Hadoop by ordinary inexpensive machine consisting of a cluster of servers to distribute and process the data that is very low cost.

  3.3 - High Efficiency (Efficient)

  • By concurrent data, the Hadoop dynamic parallel can move data between nodes, such that very fast.

  3.4 - Reliability (Rellable)

  • Can automatically maintain multiple copies of data, and can automatically re-deploy (redeploy) after a failure computing tasks.
  • So Hadoop basis of the capability to store and process data bits worthy of trust.


Determine the direction of development, adhere to business accumulation.

- - - - - - - - - - - - - - - - - - - - - - - - - - - -




Four, Hadoop applications at home and abroad

Whether domestic or foreign, industry Hadoop is the most popular Internet field, we can say the Internet is the main use of force hadoop.

  4.1 ~ abroad.

  • Yahoo's Hadoop application in support of the advertising system, user behavior analysis, support for searching the Web.
  • Facebook main internal storage using Hadoop logs and multi-dimensional data, and as reporting, analysis and machine learning data sources.

  4.2 to country.

  • BAT leading Internet company Hadoop user is doing my defenders.
  • For example, Ali ladder (14 years, China's largest Hadoop cluster), Baidu platform log analysis, recommendation engine systems.

alt

  • Other non-domestic Internet sector also has applications in many hadoop,
  • such as:
  • Financial Sector: Personal Credit Analysis
  • Securities Industry: Investment Model
  • Transportation industry: vehicles, traffic monitoring and analysis
  • Telecommunications Industry: Internet user behavior analysis
  • In short: hadoop not linked with certain specific industry or a specific business, it is only a tool used for the processing of massive data analysis.


Determine the direction of development, adhere to business accumulation.

- - - - - - - - - - - - - - - - - - - - - - - - - - - -

^ At this point, Hadoop acquaintance completed.


- - - - - - - - - - - - - - - - - - - - - - - - - - - -


※ worldly temptations so great that the firm always moved.

Decimal value of the binary representation is 1.5625?


A、101.1001
B、0.001
C、101.111
D、1.1001

D
alt



Determine the direction of development, adhere to business accumulation.

- - - - - - - - - - - - - - - - - - - - - - - - - - - -


Note: CSDN end of the phone does not support chapter jumps within the chain, but the chain is available, also requested a better experience on the PC side.

I know my weakness, I know what you are picky, but I just I do not like fireworks, thank you for pointing, creating a piece of me :)!



Determine the direction of development, adhere to business accumulation.


Guess you like

Origin blog.csdn.net/weixin_42464054/article/details/91529138