Architecture
- NameNode
- HDFS master node, administrator
- Receiving client (command-line, Java programs) request: create directories, upload, download, delete data
- HDFS management and maintenance of logs and meta-information
- Log file (edits file)
- Binary files, client records all operations, while reflecting the latest state of HDFS
- $HADOOP_HOME/tmp/dfs/name/current
- Log Viewer (edits viewer): the edits converted into text (XML) format
- hdfs oev -i edits_inprogress_0000000000000000107 -o ~ / a.xml
- Meta-information (fsimage file)
- Recording position information block, the data block is redundant information, the latest state is not reflected in the HDFS
- $HADOOP_HOME/tmp/dfs/name/current
- image viewer, the fsimage documents into text or xml
- Log file (edits file)
- DataNode
- Data Node
- Data blocks stored database (1.x: 64M, 2.x: 128M)
- /root/training/hadoop-2.7.3/tmp/dfs/data/current/BP-419062579-192.168.157.111-1535553141546/current/finalized/subdir0/subdir0
- Block redundancy setting principles: general data with the number of nodes of the same, but the maximum not more than three
- After Hadoop 3.x, HDFS erasure code technology, greatly saving storage space (save half)
- SecondaryNameNode
- The second name of the node
- Merge log information
- Because edits file records up to date information, and as more and more operations, the greater edits
- The latest information edits are written in fsimage
- edits files can be emptied
- NameNode usually deployed on a single machine, increase download speed
- When the merger? When issuing a checkpoint HDFS (the checkpoint)
- HDFS every 60 minutes to produce a checkpoint (fs.check.period)
- edits the file reaches 64M (fs.check.size)
data transmission
- Data Upload
- Request upload data Distributed FileSystem.java
- Creating DFSClient.java
- Establish RPC communication
- Get NameNode proxy object NameNodeProxies (HA)
- Create a file meta-information request
- Create a file meta information
- The meta-information back to the Distributed FileSystem
- Create an output stream FSDataOutputStream
- Upload data to DataNode
- According to the meta-information, the level of replication
- Download Data
- request
- create
- Establish RPC communication
- Request metainfo
- Find the meta-information (check the cache first, then check fsimage)
- Returns the meta-information
- Create an input stream
- Download data block
- The synthesis of a block of data downloaded file
Advanced Features
- Safe Mode
- Read-only, normal operation off
- HDFS self-protection mechanism, checking copies of data blocks rate
- If a redundant copy is less than the set rate (DataNodes broken), the level of replication
- Set the copy ratio in the hdfs-default.xml
- Snapshot
- All file system or a directory in the mirror a moment, turned off by default
- Snapshot enabled directory
- -CreateSnapshot create a snapshot snapshot directory name
- For the following scenarios
- Prevent user errors / backup / testing / disaster recovery
- Generally not recommended, because there would have been redundant, and then generate new redundant, a waste of space
- quota
- HDFS is the amount of space allocated to each directory
- Name quota
- Set up to store files in this directory (directory) number
- Quota
- Settings can be stored in the directory maximum file size
- Trash
- Disabled by default
- Put / trash
- Recycle Bin files can be quickly restored
- Can set a time, the file is automatically deleted after more than
- User rights management
- Function is weak
- We recommend the use of Hadoop Kerberos
Boot process
- Web page -> Startup Progress
- Loading fsimage
- Loading edits
- Saving checkpoint
- Safe mode
The underlying principle
- RPC(remote procedure call)
- Remote Procedure Call (agreement)
- In the client calls the server side
- A frame, the caller and the callee communication operation in which completed
- Remote call code, the implement connected to the communication between the caller and the callee
- Synchronous communication is based on a form of mutual communication between Client / Server process
- Client is to call the person requesting the service, Server program execution request is being called the Client
- Hadoop implementation with Java RPC
- Client
- Service-Terminal
- Remote Procedure Call (agreement)
MyRPCClient.java
1 package rpc.client; 2 3 import java.io.IOException; 4 import java.net.InetSocketAddress; 5 6 import org.apache.hadoop.conf.Configuration; 7 import org.apache.hadoop.ipc.RPC; 8 9 import rpc.server.MyInterface; 10 11 public class MyRPCClient { 12 13 public static void main(String[] args) throws IOException { 14 // 使用Hadoop RPC框架调用Server端程序 15 //Get Server deployment object proxy object 16 MyInterface Proxy = RPC.getProxy (. MyInterface class , 17 MyInterface.versionID, 18 new new InetSocketAddress ( "localhost", 7788 ), 19 new new the Configuration ()); 20 // Use a proxy object to call Server program 21 is String proxy.sayHello Result = ( "Tom" ); 22 is System.out.println (Result); 23 is } 24 }
MyInterface.java
. 1 Package rpc.server; 2 . 3 Import org.apache.hadoop.ipc.VersionedProtocol; . 4 . 5 public interface MyInterface the extends VersionedProtocol { . 6 // defines the version number 7 // version number sign . 8 public static Long versionID =. 1 ; . 9 10 // business methods are defined . 11 public String the sayHello (String name); 12 is }
MyInterfaceImpl.java
. 1 Package rpc.server; 2 . 3 Import java.io.IOException; . 4 . 5 Import org.apache.hadoop.ipc.ProtocolSignature; . 6 . 7 public class MyInterfaceImpl the implements MyInterface { . 8 . 9 @Override 10 public ProtocolSignature getProtocolSignature ( . 11 String the arg0, Long arg1, int arg2) 12 is throws IOException { 13 is // defined signature information through the version number 14 return new new ProtocolSignature(MyInterface.versionID,null); 15 } 16 17 @Override 18 public long getProtocolVersion(String arg0, long arg1) 19 throws IOException { 20 // 返回版本号 21 return MyInterface.versionID; 22 } 23 24 @Override 25 public String sayHello(String name) { 26 System.out.println("**********调用Server端**********"); 27 return "Hello " + name; 28 } 29 }
MyRPCServer.java
1 package rpc.server; 2 3 import java.io.IOException; 4 5 import org.apache.hadoop.HadoopIllegalArgumentException; 6 import org.apache.hadoop.conf.Configuration; 7 import org.apache.hadoop.ipc.RPC; 8 import org.apache.hadoop.ipc.RPC.Server; 9 10 public class MyRPCServer { 11 public static void main(String[] args) throws HadoopIllegalArgumentException, IOException { 12 // 利用Hadoop的RPC框架实现RPC Server 13 is 14 // use RPC Builder constructs 15 RPC.Builder Builder = new new RPC.Builder ( new new the Configuration ()); 16 . 17 // parameters define the Server 18 is builder.setBindAddress ( "localhost" ); . 19 builder.setPort (7788 ) ; 20 21 // deploy 22 builder.setProtocol (MyInterface. class ); 23 builder.setInstance ( new new MyInterfaceImpl ()); 24- 25 // create Server RPC 26 Server Server = builder.build(); 27 28 server.start(); 29 } 30 }
- Java dynamic proxy object
- If the name of a class have $ indicate that this is a proxy object
- It is a packaging design pattern
- You can enhance the function of the class
- Applications: database connection pool
- newProxyInstance parameters
- ClassLoader class loader
- Class <?> [] Real object that implements the interface
- InvocationHandler implement the interface to handle client calls
MyBusiness.java
1 package proxy; 2 3 public interface MyBusiness { 4 public void method1(); 5 public void method2(); 6 }
MyBusinessImpl.java
1 package proxy; 2 3 public class MyBusinessImpl implements MyBusiness { 4 5 @Override 6 public void method1() { 7 System.out.println("*********method1*********"); 8 } 9 10 @Override 11 public void method2() { 12 System.out.println("*********method2*********"); 13 } 14 }
TestMain.java
. 1 Package Proxy; 2 . 3 Import java.lang.reflect.InvocationHandler; . 4 Import the java.lang.reflect.Method; . 5 Import the java.lang.reflect.Proxy; . 6 . 7 public class the testMain { . 8 . 9 public static void main (String [ ] args) { 10 // create objects . 11 MyBusiness obj = new new MyBusinessImpl (); 12 is // create a proxy object 13 is . MyBusiness proxy = (MyBusiness) the Proxy.newProxyInstance (the testMain class.getClassLoader(), 14 obj.getClass().getInterfaces(), 15 new InvocationHandler(){ 16 @Override 17 public Object invoke(Object proxy,Method method,Object[] args)throws Throwable { 18 if(method.getName().equals("method1")) { 19 //重写 20 System.out.println("*************代理对象中的method1*************"); 21 return null; 22 }else { 23 // Other methods 24 return Method.invoke (obj, args); 25 } 26 is } 27 }); 28 // call the real object through the proxy object 29 proxy.method1 (); 30 proxy.method2 (); 31 is } 32 }