[BD] HDFS

Architecture

  • NameNode
    • HDFS master node, administrator
    • Receiving client (command-line, Java programs) request: create directories, upload, download, delete data
    • HDFS management and maintenance of logs and meta-information
      • Log file (edits file)
        • Binary files, client records all operations, while reflecting the latest state of HDFS
        • $HADOOP_HOME/tmp/dfs/name/current
        • Log Viewer (edits viewer): the edits converted into text (XML) format
        • hdfs oev -i edits_inprogress_0000000000000000107 -o ~ / a.xml
      • Meta-information (fsimage file)
        • Recording position information block, the data block is redundant information, the latest state is not reflected in the HDFS
        • $HADOOP_HOME/tmp/dfs/name/current
        • image viewer, the fsimage documents into text or xml
  • DataNode
    • Data Node
    • Data blocks stored database (1.x: 64M, 2.x: 128M)
    • /root/training/hadoop-2.7.3/tmp/dfs/data/current/BP-419062579-192.168.157.111-1535553141546/current/finalized/subdir0/subdir0
    • Block redundancy setting principles: general data with the number of nodes of the same, but the maximum not more than three
    • After Hadoop 3.x, HDFS erasure code technology, greatly saving storage space (save half)
  • SecondaryNameNode
    • The second name of the node
    • Merge log information
    • Because edits file records up to date information, and as more and more operations, the greater edits
    • The latest information edits are written in fsimage
    • edits files can be emptied
    • NameNode usually deployed on a single machine, increase download speed
    • When the merger? When issuing a checkpoint HDFS (the checkpoint)
      • HDFS every 60 minutes to produce a checkpoint (fs.check.period)
      • edits the file reaches 64M (fs.check.size)

data transmission

  • Data Upload
    • Request upload data Distributed FileSystem.java
    • Creating DFSClient.java
    • Establish RPC communication
    • Get NameNode proxy object NameNodeProxies (HA)
    • Create a file meta-information request
    • Create a file meta information
    • The meta-information back to the Distributed FileSystem
    • Create an output stream FSDataOutputStream
    • Upload data to DataNode
    • According to the meta-information, the level of replication
  • Download Data
    • request
    • create
    • Establish RPC communication
    • Request metainfo
    • Find the meta-information (check the cache first, then check fsimage)
    • Returns the meta-information
    • Create an input stream
    • Download data block
    • The synthesis of a block of data downloaded file

 

 

Advanced Features

  • Safe Mode
    • Read-only, normal operation off
    • HDFS self-protection mechanism, checking copies of data blocks rate
    • If a redundant copy is less than the set rate (DataNodes broken), the level of replication
    • Set the copy ratio in the hdfs-default.xml
  • Snapshot
    • All file system or a directory in the mirror a moment, turned off by default
    • Snapshot enabled directory
    • -CreateSnapshot create a snapshot snapshot directory name
    • For the following scenarios
      • Prevent user errors / backup / testing / disaster recovery
    • Generally not recommended, because there would have been redundant, and then generate new redundant, a waste of space
  • quota
    • HDFS is the amount of space allocated to each directory
    • Name quota
      • Set up to store files in this directory (directory) number
    • Quota
      • Settings can be stored in the directory maximum file size
  • Trash
    • Disabled by default
    • Put / trash
    • Recycle Bin files can be quickly restored
    • Can set a time, the file is automatically deleted after more than
  • User rights management
    • Function is weak
    • We recommend the use of Hadoop Kerberos

Boot process

  • Web page -> Startup Progress
  • Loading fsimage
  • Loading edits
  • Saving checkpoint
  • Safe mode

The underlying principle

  • RPC(remote procedure call)
    • Remote Procedure Call (agreement)
      • In the client calls the server side
      • A frame, the caller and the callee communication operation in which completed
      • Remote call code, the implement connected to the communication between the caller and the callee
      • Synchronous communication is based on a form of mutual communication between Client / Server process
      • Client is to call the person requesting the service, Server program execution request is being called the Client
    • Hadoop implementation with Java RPC
      • Client
      • Service-Terminal

MyRPCClient.java

 1 package rpc.client;
 2 
 3 import java.io.IOException;
 4 import java.net.InetSocketAddress;
 5 
 6 import org.apache.hadoop.conf.Configuration;
 7 import org.apache.hadoop.ipc.RPC;
 8 
 9 import rpc.server.MyInterface;
10 
11 public class MyRPCClient {
12 
13     public static void main(String[] args) throws IOException {
14         // 使用Hadoop RPC框架调用Server端程序
15         //Get Server deployment object proxy object 
16          MyInterface Proxy = RPC.getProxy (. MyInterface class ,
 17                       MyInterface.versionID, 
 18                       new new InetSocketAddress ( "localhost", 7788 ), 
 19                       new new the Configuration ());
 20          // Use a proxy object to call Server program 
21 is          String proxy.sayHello Result = ( "Tom" );
 22 is          System.out.println (Result);
 23 is      }
 24 }
View Code

MyInterface.java

. 1  Package rpc.server;
 2  
. 3  Import org.apache.hadoop.ipc.VersionedProtocol;
 . 4  
. 5  public  interface MyInterface the extends VersionedProtocol {
 . 6      // defines the version number
 7      // version number sign 
. 8      public  static  Long versionID =. 1 ;
 . 9      
10      // business methods are defined 
. 11      public String the sayHello (String name);
 12 is }
View Code

MyInterfaceImpl.java

. 1  Package rpc.server;
 2  
. 3  Import java.io.IOException;
 . 4  
. 5  Import org.apache.hadoop.ipc.ProtocolSignature;
 . 6  
. 7  public  class MyInterfaceImpl the implements MyInterface {
 . 8  
. 9      @Override
 10      public ProtocolSignature getProtocolSignature (
 . 11              String the arg0, Long arg1, int arg2)
 12 is              throws IOException {
 13 is          // defined signature information through the version number 
14          return  new new ProtocolSignature(MyInterface.versionID,null);
15     }
16 
17     @Override
18     public long getProtocolVersion(String arg0, long arg1)
19             throws IOException {
20         // 返回版本号
21         return MyInterface.versionID;
22     }
23 
24     @Override
25     public String sayHello(String name) {
26         System.out.println("**********调用Server端**********");
27         return "Hello " + name;
28     }
29 }
View Code

MyRPCServer.java

 1 package rpc.server;
 2 
 3 import java.io.IOException;
 4 
 5 import org.apache.hadoop.HadoopIllegalArgumentException;
 6 import org.apache.hadoop.conf.Configuration;
 7 import org.apache.hadoop.ipc.RPC;
 8 import org.apache.hadoop.ipc.RPC.Server;
 9 
10 public class MyRPCServer {
11     public static void main(String[] args) throws HadoopIllegalArgumentException, IOException {
12         // 利用Hadoop的RPC框架实现RPC Server
13 is          
14          // use RPC Builder constructs 
15          RPC.Builder Builder = new new RPC.Builder ( new new the Configuration ());
 16          
. 17          // parameters define the Server 
18 is          builder.setBindAddress ( "localhost" );
 . 19          builder.setPort (7788 ) ;
 20          
21          // deploy 
22          builder.setProtocol (MyInterface. class );
 23          builder.setInstance ( new new MyInterfaceImpl ());
 24-          
25          // create Server RPC 
26          Server Server = builder.build();
27         
28         server.start();
29     }
30 }
View Code

 

 

    

  •  Java dynamic proxy object
    • If the name of a class have $ indicate that this is a proxy object
    • It is a packaging design pattern
    • You can enhance the function of the class
    • Applications: database connection pool
    • newProxyInstance parameters
      • ClassLoader class loader
      • Class <?> [] Real object that implements the interface
      • InvocationHandler implement the interface to handle client calls

MyBusiness.java

1 package proxy;
2 
3 public interface MyBusiness {
4     public void method1();
5     public void method2();
6 }
View Code

MyBusinessImpl.java

 1 package proxy;
 2 
 3 public class MyBusinessImpl implements MyBusiness {
 4 
 5     @Override
 6     public void method1() {
 7         System.out.println("*********method1*********");
 8     }
 9 
10     @Override
11     public void method2() {
12         System.out.println("*********method2*********");
13     }
14 }
View Code

TestMain.java

. 1  Package Proxy;
 2  
. 3  Import java.lang.reflect.InvocationHandler;
 . 4  Import the java.lang.reflect.Method;
 . 5  Import the java.lang.reflect.Proxy;
 . 6  
. 7  public  class the testMain {
 . 8  
. 9      public  static  void main (String [ ] args) {
 10          // create objects 
. 11          MyBusiness obj = new new MyBusinessImpl ();
 12 is          // create a proxy object 
13 is          . MyBusiness proxy = (MyBusiness) the Proxy.newProxyInstance (the testMain class.getClassLoader(),
14                                                                   obj.getClass().getInterfaces(),
15                                                                   new InvocationHandler(){
16             @Override
17             public Object invoke(Object proxy,Method method,Object[] args)throws Throwable {
18                 if(method.getName().equals("method1")) {
19                     //重写
20                     System.out.println("*************代理对象中的method1*************");
21                     return null;
22                 }else {
23                     // Other methods 
24                      return Method.invoke (obj, args);
 25                  }
 26 is              }
 27          });
 28          // call the real object through the proxy object 
29          proxy.method1 ();
 30          proxy.method2 ();
 31 is      }
 32 }
View Code

 

Guess you like

Origin www.cnblogs.com/cxc1357/p/12584402.html