HDFS memory configuration

The figure is HDFS architecture:

 

 

 

Can know from the figure above, HDFS contains NameNode, DataNode and Client three roles, when we are not configured HDFS HA, that there is a role SecondaryNameNode, these four roles are based on the JVM Java process . Since it is a Java process, then we can certainly adjust the size of the memory used by these four roles. Next we will look at in detail how to configure memory HDFS under each role

We said here mainly refers to the memory configuration of the JVM heap memory

 

The default memory configuration

NameNode

When we start to build a good HDFS cluster in the lesson, we can see these two processes NameNode and SecondaryNameNode heap memory occupied by the following command on the master:

## executed on the master machine 
ps -ef | grep NameNode

  The results obtained were as follows:

 

 

 

The first image above -Xmx1000mrepresents the heap memory is NameNode1000M

2 at the above figure -Xmx1000mrepresent the heap memory is SecondaryNameNode1000M

DataNode

We can see slave1 and DataNode occupied on slave2 heap memory via the following command:

ps -ef | grip Data Node

  The results obtained were as follows:

 

 

 

 

 

 As can be seen from the figure, DataNode on heap memory are two slave1000M

Client

When we execute the following command when:

hadoop fs -ls /

  It is actually a name for FsShell start the Java process, as shown below:

 

 

 This is the default process heap memory FsShell a Client process, this process is the Client512M

in conclusion

  • HDFS cluster roles (NameNode, SecondaryNameNode, DataNode) default heap size is1000M
  • Client heap memory size of the process is512M

How to configure memory

To know how to configure memory of each role, we first need to thoroughly understand the default memory configuration is where the above configuration.

These are the default configuration Hadoop configuration directory in the installation directory under the file hadoop-env.sh, that is/home/hadoop-twq/bigdata/hadoop-2.7.5/etc/hadoop/hadoop-env.sh

In hadoop-env.sha few and associated configuration memory file: 

 

 

 We accordance with the plans from the top down, respectively closer look

# The maximum amount of heap to use, in MB. Default is 1000.
#export HADOOP_HEAPSIZE=
#export HADOOP_NAMENODE_INIT_HEAPSIZE=""

  

HADOOP_HEAPSIZE: That all the roles in HDFS maximum heap memory, by default 1000M, this is also the default heap size for all of our roles HDFS process

HADOOP_NAMENODE_INIT_HEAPSIZE: Indicates NameNode initialization heap memory size, default is 1000M.

# Extra Java runtime options.  Empty by default.
export HADOOP_OPTS="$HADOOP_OPTS -Djava.net.preferIPv4Stack=true"

  HADOOP_OPTS: Indicates the HDFS JVM parameter settings for all roles for generic JVM arguments HDFS all the roles can be set by this configuration. The default configuration is empty words

# Command specific options appended to HADOOP_OPTS when specified
export HADOOP_NAMENODE_OPTS="-Dhadoop.security.logger=${HADOOP_SECURITY_LOGGER:-INFO,RFAS} -Dhdfs.audit.logger=${HDFS_AUDIT_LOGGER
:-INFO,NullAppender} $HADOOP_NAMENODE_OPTS"

  HADOOP_NAMENODE_OPTS: JVM parameters for a particular configuration of the NameNode, only the default settings hadoop.security.loggerand hdfs.audit.loggertwo log level parameter information

export HADOOP_DATANODE_OPTS="-Dhadoop.security.logger=ERROR,RFAS $HADOOP_DATANODE_OPTS"

  

HADOOP_DATANODE_OPTS: JVM parameters for a particular configuration of DataNode default settings only hadoop.security.loggerlog level parameter information

export HADOOP_SECONDARYNAMENODE_OPTS="-Dhadoop.security.logger=${HADOOP_SECURITY_LOGGER:-INFO,RFAS} -Dhdfs.audit.logger=${HDFS_AUD
IT_LOGGER:-INFO,NullAppender} $HADOOP_SECONDARYNAMENODE_OPTS"

  HADOOP_SECONDARYNAMENODE_OPTS: JVM parameters for a particular configuration of SecondaryNameNode default settings only hadoop.security.loggerand hdfs.audit.loggertwo log level parameter information

export HADOOP_PORTMAP_OPTS="-Xmx512m $HADOOP_PORTMAP_OPTS"

 HADOOP_PORTMAP_OPTS: This is required when JVM configuration format in HDFS, that is, the implementation of hdfs namenode -formatJVM configuration when 

# The following applies to multiple commands (fs, dfs, fsck, distcp etc)
export HADOOP_CLIENT_OPTS="-Xmx512m $HADOOP_CLIENT_OPTS"

  

HADOOP_CLIENT_OPTS: Indicates the HDFS client configuration parameters of the command to start the JVM, where the size of the JVM heap memory configuration is 512M. This configuration is the JVM heap memory configuration for client commands (such as fs, dfs, fsck, distcp, etc.)

NameNode, DataNode and Client process heap memory configuration

NameNode, DataNode and Client is in the process of heap memory hadoop-env.shconfiguration HADOOP_NAMENODE_OPTS, HADOOP_DATANODE_OPTSand HADOOP_CLIENT_OPTSconfiguration

So if we want to configure NameNode heap memory can be in two ways:

## a first embodiment 
Export HADOOP_NAMENODE_INIT_HEAPSIZE = "20480M" 

## second embodiment 
export HADOOP_NAMENODE_OPTS = "- Xms20480M -Xmx20480M -Dhadoop.security.logger = $ {HADOOP_SECURITY_LOGGER: -INFO, RFAS} -Dhdfs.audit.logger = $ HDFS_AUDIT_LOGGER {: -info, NullAppender} $ HADOO 
P_NAMENODE_OPTS "

  

If we want to configure DataNode heap memory you can have the following two ways:

## The first way 
Export HADOOP_HEAPSIZE 2048M = 

## The second approach, which will overwrite the first way above configuration 
export HADOOP_DATANODE_OPTS = "- Xms2048M -Xmx2048M -Dhadoop.security.logger = ERROR, RFAS $ HADOOP_DATANODE_OPTS "

  If we want to configure Client heap memory can be as follows:

export HADOOP_CLIENT_OPTS="-Xmx1024m $HADOOP_CLIENT_OPTS"

  

 

Guess you like

Origin www.cnblogs.com/tesla-turing/p/11487981.html