Good programmers to share large data line learning hadoop four modules commonly used file - Code World

Good programmers to share large data line learning hadoop four modules commonly used file

Enterprise 2019-08-28 04:09:06 views: null

　　1.1.1core-site.xml (tool module)

　　Including Hadoop commonly used tools, it changed its name from the original Hadoopcore part. Including system configuration tool Configuration, remote procedure call RPC, serialization mechanism and Hadoop file system FileSystem abstract and so on. They provide basic services to build a cloud computing environment on commodity hardware, and provides the necessary API to develop software to run on the platform.

　　

　　1.1.2hdfs-site.xml (data storage means)

　　Distributed file system, application data providing high throughput , high stretchability , high fault tolerance access. It is the basis for Hadoop system data storage management. It is a highly fault-tolerant system that can detect and respond to hardware failure, to run on low-cost commodity hardware. HDFS consistency simplifies the model file, by accessing streaming data , provides high-throughput access application data for applications with large data sets.

　　namenode + datanode + secondarynode

　　

　　1.1.3mapred-site.xml (data processing module)

　　Based on parallel processing systems YARN large datasets. Is a calculation model for calculating large amount of data. Hadoop MapReduce implementations of, and Common, HDFS together, constitute the early development of three components Hadoop. The MapReduce Map and Reduce the application is divided into two steps, wherein individual elements of the Map data sets specified operation, generating key - value pairs of intermediate results. Reduce the same intermediate result "key" all "values" in the statute, to obtain the final result. MapReduce functions of this division, is ideal for distributed parallel environment consisting of a large number of computers in data processing.

　　

　　1.1.4yarn-site.xml (+ job scheduling resource management platform)

　　 Task scheduling and cluster resource management

　　 resourcemanager + nodemanager

　　

　　1.2hadoop five nodes:

　　1.2.1NameNode (management node)

　　 Namenode space command manages the file system (Namespace). It maintains the file system tree (FileSystemTree) and metadata file tree of all files and folders (metadata), including editing metadata log (edits) and image files (fsimage). The management information file has two, namely Namespace image file (FsImage) and editing the log file (edits) , mainly edit log is a record of modifications made hdfs. The main image file is a file tree structure of the recording hdfs Such information Cache in RAM is, of course, these two documents will be stored in a persistent local hard disk . Namenode record the location information for each file in the data nodes where each block, but he is not persistent store this information , because this information will be reconstructed from the data node at system startup.

　　

　　1.2.2DataNode (work nodes)

　　 Datanode node file system is working , they are based on the client or namenode the schedule storing and retrieving data , and to periodically namenode transmission block their stored list (block) a.

　　 No namenode, the file system can not be used. In fact, if the server is running namenode services broken, all the files on the file system will be lost because we do not know how to rebuild the file according to DataNode blocks. All of conduct NameNode Fault-tolerant redundancy mechanism is very important .

　　 Cluster run from a server node DataNode daemon , this daemon is responsible for reading and writing HDFS data blocks to the local file system . When required by a client to read / write a data, first NameNode tell the client to which DataNode specific read / write operations, and then the client communicates directly with DataNode daemon on this server, and the relevant data block read / write operation.

　　

　　1.2.3secondary NameNode (corresponding to a MySQL database from the master copy from node)

　　 Secondary NameNode是一个用来监控HDFS状态的辅助后台程序。和NameNode一样，每个集群都有一个Secondary NameNode，并且部署在一个单独的服务器上。Secondary NameNode不同于NameNode，它不接受或者记录任何实时的数据变化，但是，它会与NameNode进行通信，以便定期地保存HDFS元数据的快照。由于NameNode是单点的，通过Secondary NameNode的快照功能，可以将NameNode的宕机时间和数据损失降低到最小。同时，如果NameNode发生问题，Secondary NameNode可以及时地作为备用NameNode使用。

　　

　　1.2.4ResourceManager

　　 ResourceManage 即资源管理，在YARN中，ResourceManager负责集群中所有资源的统一管理和分配，它接收来自各个节点（NodeManager）的资源汇报信息，并把这些信息按照一定的策略分配给各个应用程序（实际上是ApplicationManager）。

　　 RM包括Scheduler（定时调度器）和ApplicationManager（应用管理器）。Schedular负责向应用程序分配资源，它不做监控以及应用程序的状态跟踪，并且不保证会重启应用程序本身或者硬件出错而执行失败的应用程序。ApplicationManager负责接受新的任务，协调并提供在ApplicationMaster容器失败时的重启功能.每个应用程序的AM负责项Scheduler申请资源，以及跟踪这些资源的使用情况和资源调度的监控

　　

　　1.2.5Nodemanager

　　 NM是ResourceManager在slave机器上的代理，负责容器管理，并监控它们的资源使用情况，以及向ResourceManager/Scheduler提供资源使用报告

　　HDFS文件存储机制:

　　

　HDFS集群分为两大角色:NameNode、DataNode、（secondary NameNode）

　NameNode负责管理整个文件系统的元数据

DataNode负责管理用户的文件数据块

　文件会按照固定的大小切成若干块后分布式存储在若干台DataNode上

　每一个文件块可以有多个副本,并存放在不同的DataNode上

　DataNode会定期向NameNode汇报自身所保存的文件block信息,而NameNode则会负责保持文件的副本数量

　HDFS的内部工作机制对客户端保持透明,客户端请求访问HDFS都是通过向NameNode申请来进行

Guess you like

Origin blog.51cto.com/14479068/2432973

Good programmers to share large data line learning hadoop four modules commonly used file

Good programmers to share large data line learning instruction learning Hbase

Good programmers to share large data line learning technology ELK

Good programmers to share large data line learning mode of operation of the hive

Good programmers to share large data line learning AWK Comments

Good programmers to share large data line learning Scala branching and looping

Good programmers to share large data line learning higher-order functions

Good programmers to share large data line learning scala single and associated objects

Good programmers to share large data line learning the whole process of MapReduce analytic

Good programmers to share large data line learning Scala series of maps Map

Good programmers large share data zero-based learning how to start the Hadoop

Good programmers to share practical tutorial large data arrays of large data

Good programmers to share large data set of learning path manipulation functions Scala Series

Good programmers learn Hadoop Big Data learning course dry cargo Share

Programmers girls learn how to share large data? Do a good job?

Good programmers to share large data arrays explain in Shell

Good programmers to share large data submitted MapReduce job flow

Good programmers tutorial to share large data HadoopHDFS Operation Command Summary

Good programmers Big Data learning routes Share HDFS learning summary

Good programmers HTML5 front-end large share of the learning function articles

Good programmers share the front line learning simulation in JavaScript object-oriented technology

Good web front-end programmers to share the front line self-learning how to find a job

Good programmers to share large data several important questions about the training of kafka

Good programmers to share large data Tutorial Series Scala pattern matching, and Sample Class

Good programmers Big Data learning routes Share resilient distributed datasets RDD

Good programmers Big Data learning route map share function + map + Ganso

Websites commonly used by programmers (I usually feel good and keep updating)

Websites commonly used by programmers (I usually feel good and keep updating)

Product data analysis commonly used four methods

Good programmers share 10 Big Data Big Data jargon

Recommended

"U.S. Threats and Damage to Global Cyberspace Security and Development" report released

Ranking

[DP] expected [UVA1498] Activation

What is the ABAP Dynpro program

记录一下halcon例程报错和两个视觉库感兴趣区域绘制

characterReplacement-the longest repeated character after replacement

Target element by id somewhere within an element targeted by id

Test classification

NOI 8780 interceptor missile linear dp

Equipment inspection management wants to fine, light streams Weapon children

sql packet takes the value of the most

Computer java project recommendation SSM (Spring+SpringMVC+MyBatis) takeaway ordering management system

Daily

More

2024-04-29(5)

2024-04-28(12)

2024-04-27(29)

2024-04-26(22)

2024-04-25(32)

2024-04-24(30)

2024-04-23(30)

2024-04-22(5)

2024-04-21(0)

2024-04-20(6)