"Hadoop The Definitive Guide 4th Edition" - Chapter 3 Hadoop Distributed File System - Design Concepts / stream / Java Interface

The design concept 3.1 HDFS

  • HDFS data access to streaming mode to store large files
  • HDFS delay higher, HBASE to be a better choice.
  • A large number of small files, namenode (named node / space) file system metadata stored in memory, each accounted for about 150Bytes, millions / ten million when the need to consider the size of the physical memory of the machine
  • HDFS supports only single user to write data in the "end of file" additional ways

The concept of 3.2 HDFS

  • Data block: a data block several times in the disk, in MapReduce map tasks typically process only a HDFS data block, not excessively large (related to the cluster)
  • namenode sum datanode
    • namenode for the management node, the file system namespace, saves all the file information of the entire file system tree, store information about each record in each node file of (the reference file and block cross-correlation)
    • datanode data node,
  • Client Client, to access the entire file system namenode and interactive datanode
  • Block buffers, the frequently accessed blocks can be explicitly loaded into the memory DataNode
  • Federal HDFS, suitable for large clusters, memory can become a bottleneck, HDFS Federation of namenode allowed to expand, each node manages only part of the file, such as different node corresponds to a different directory
  • HDFS high availability
    • Single point of failure namenode, Hadoop2 added "activities - Backup" mode of nameNode

3.3 command line mode

  • Enter hadoop fs -help get command Daquan

Guess you like

Origin www.cnblogs.com/zhazhaacmer/p/12133377.html