Architecture of HDFS in Hadoop

1. Switch statement

Grammar rules:
①The variable type in the statement can be byte, short, int or char; starting from javaSE5, enumeration types are supported; starting from javaSE7, switch supports String.
② When there is no break, the subsequent case statements will be executed

2. Modifiers

access modifier

In Java, access control symbols can be used to protect access to classes, variables, methods, and constructors. Java supports 4 different access rights.

default (i.e. default, write nothing): Visible within the same package, without any modifiers. Use objects: classes, interfaces, variables, methods.

private : Visible within the same class. Objects used: variables, methods. Note: Classes cannot be modified (external classes)

public : Visible to all classes. Objects used: classes, interfaces, variables, methods

protected : Visible to classes and all subclasses in the same package. Objects used: variables, methods. Note: Classes (external classes) cannot be modified.

non-access modifier

static : you can use classname.variablename and classname.methodname to access
final : the final method can be inherited by subclasses, but cannot be rewritten by subclasses
abstract : the only purpose of an abstract class is to expand the class in the future
synchronized : the modified method is the same Time can only be accessed by one thread
transient : when the serialized object contains a variable modified by transient, the JVM skips that particular variable.
(transient word meaning: fleeting, short-lived; temporary, (work) temporary)

Persistence : Persistence is the mechanism for converting program data between a persistent state and a transient state. Persistence, that is, saving data (such as objects in memory) to a storage device that can be stored permanently.

Therefore, variables modified by transient will not be persisted.

volatile : translation -> changeable, turbulent, capricious; (emotion) changeable, irritable, sudden onset; (liquid or solid) volatile, gasified; (computer memory) volatile.

The member variable modified by volatile is forced to re-read the value of the member variable from the shared memory every time it is accessed by a thread. Moreover, when the member variable changes, the thread will be forced to write the changed value back to the shared memory. In this way, at any moment, two different threads always see the same value of a member variable.

3. HDFS

HDFS (Hadoop distribute file system) is a distributed file management system. After the file is uploaded, it cannot be modified, which is suitable for the scenario of writing once and reading out multiple times.

1 Advantages and disadvantages

advantage

  1. High fault tolerance: after a copy is lost, it can be automatically restored
  2. Suitable for processing large data: Whether the file is large or the number of files is large, it can be processed.
  3. It can be built on cheap machines and provide reliability through a multi-copy mechanism.

shortcoming

  1. Not suitable for low-latency data access: cannot store data in milliseconds
  2. It cannot store a large number of small files efficiently: it will occupy a large amount of NameNode memory to store file directory and block information. The seek time for small file storage can exceed the read time, violating the design goals of HDFS.
  3. Concurrent writing and random file modification are not supported: only data append is supported, and file random modification is not supported.

2 HDFS architecture

①NameNode: Master, it is a supervisor, the manager
(1) manages the HDFS namespace;
(2) configures the copy strategy;
(3) manages the data block (Block) mapping information;
(4) processes the client's read and write requests
②DataNode : It is Slave. NameNode issues commands, and DataNode performs actual operations.
(1) Store the actual data block;
(2) Execute the read/write operation of the data block;
③Client
(1) File segmentation: segment according to the file size of NameNode, Hadoop2.x/3.x defaults to 128MB , 1.x version is 64M;
(2) Interact with NameNode to obtain file location information;
(3) Interact with DataNode to read or write data;
(4) Client provides some commands to manage HDFS, add, delete, modify and check related operate;

④Secondary NameNode: It is not the hot standby of NameNode. When the NameNode hangs up, it cannot immediately replace the NameNode and provide services.
(1) Assist NameNode to share its workload, such as regularly merging Fsimage and Edits, and pushing them to NameNode;
(2) In case of emergency, it can assist in recovering NameNode;
HDFS architecture diagram

3 file blocks

The best condition is when the addressing time is 1% of the transmission time. The mechanical hard disk is recommended to be 128MB, and the solid-state hard disk is recommended to be 256MB.

  • If the file block is too small, it will increase the seek time, and the program has been looking for the beginning of the block.
  • If the file block is too large, the time to transfer data from the disk will be significantly longer than the time required to locate the block location, resulting in very slow processing of this block of data.

Guess you like

Origin blog.csdn.net/qq_44273739/article/details/131863988