The figure is HDFS architecture:
Can know from the figure above, HDFS contains NameNode, DataNode and Client three roles, when we are not configured HDFS HA, that there is a role SecondaryNameNode, these four roles are based on the JVM Java process . Since it is a Java process, then we can certainly adjust the size of the memory used by these four roles. Next we will look at in detail how to configure memory HDFS under each role
We said here mainly refers to the memory configuration of the JVM heap memory
The default memory configuration
NameNode
When we start to build a good HDFS cluster in the lesson, we can see these two processes NameNode and SecondaryNameNode heap memory occupied by the following command on the master:
## executed on the master machine ps -ef | grep NameNode
The results obtained were as follows:
The first image above -Xmx1000m
represents the heap memory is NameNode1000M
2 at the above figure -Xmx1000m
represent the heap memory is SecondaryNameNode1000M
DataNode
We can see slave1 and DataNode occupied on slave2 heap memory via the following command:
ps -ef | grip Data Node
The results obtained were as follows:
As can be seen from the figure, DataNode on heap memory are two slave1000M
Client
When we execute the following command when:
hadoop fs -ls /
It is actually a name for FsShell start the Java process, as shown below:
This is the default process heap memory FsShell a Client process, this process is the Client512M
in conclusion
- HDFS cluster roles (NameNode, SecondaryNameNode, DataNode) default heap size is
1000M
- Client heap memory size of the process is
512M
How to configure memory
To know how to configure memory of each role, we first need to thoroughly understand the default memory configuration is where the above configuration.
These are the default configuration Hadoop configuration directory in the installation directory under the file hadoop-env.sh
, that is/home/hadoop-twq/bigdata/hadoop-2.7.5/etc/hadoop/hadoop-env.sh
In hadoop-env.sh
a few and associated configuration memory file:
We accordance with the plans from the top down, respectively closer look
# The maximum amount of heap to use, in MB. Default is 1000. #export HADOOP_HEAPSIZE= #export HADOOP_NAMENODE_INIT_HEAPSIZE=""
HADOOP_HEAPSIZE
: That all the roles in HDFS maximum heap memory, by default 1000M
, this is also the default heap size for all of our roles HDFS process
HADOOP_NAMENODE_INIT_HEAPSIZE
: Indicates NameNode initialization heap memory size, default is 1000M
.
# Extra Java runtime options. Empty by default. export HADOOP_OPTS="$HADOOP_OPTS -Djava.net.preferIPv4Stack=true"
HADOOP_OPTS
: Indicates the HDFS JVM parameter settings for all roles for generic JVM arguments HDFS all the roles can be set by this configuration. The default configuration is empty words
# Command specific options appended to HADOOP_OPTS when specified export HADOOP_NAMENODE_OPTS="-Dhadoop.security.logger=${HADOOP_SECURITY_LOGGER:-INFO,RFAS} -Dhdfs.audit.logger=${HDFS_AUDIT_LOGGER :-INFO,NullAppender} $HADOOP_NAMENODE_OPTS"
HADOOP_NAMENODE_OPTS
: JVM parameters for a particular configuration of the NameNode, only the default settings hadoop.security.logger
and hdfs.audit.logger
two log level parameter information
export HADOOP_DATANODE_OPTS="-Dhadoop.security.logger=ERROR,RFAS $HADOOP_DATANODE_OPTS"
HADOOP_DATANODE_OPTS
: JVM parameters for a particular configuration of DataNode default settings only hadoop.security.logger
log level parameter information
export HADOOP_SECONDARYNAMENODE_OPTS="-Dhadoop.security.logger=${HADOOP_SECURITY_LOGGER:-INFO,RFAS} -Dhdfs.audit.logger=${HDFS_AUD IT_LOGGER:-INFO,NullAppender} $HADOOP_SECONDARYNAMENODE_OPTS"
HADOOP_SECONDARYNAMENODE_OPTS
: JVM parameters for a particular configuration of SecondaryNameNode default settings only hadoop.security.logger
and hdfs.audit.logger
two log level parameter information
export HADOOP_PORTMAP_OPTS="-Xmx512m $HADOOP_PORTMAP_OPTS"
HADOOP_PORTMAP_OPTS
: This is required when JVM configuration format in HDFS, that is, the implementation of hdfs namenode -format
JVM configuration when
# The following applies to multiple commands (fs, dfs, fsck, distcp etc) export HADOOP_CLIENT_OPTS="-Xmx512m $HADOOP_CLIENT_OPTS"
HADOOP_CLIENT_OPTS
: Indicates the HDFS client configuration parameters of the command to start the JVM, where the size of the JVM heap memory configuration is512M
. This configuration is the JVM heap memory configuration for client commands (such as fs, dfs, fsck, distcp, etc.)
NameNode, DataNode and Client process heap memory configuration
NameNode, DataNode and Client is in the process of heap memory hadoop-env.sh
configuration HADOOP_NAMENODE_OPTS
, HADOOP_DATANODE_OPTS
and HADOOP_CLIENT_OPTS
configuration
So if we want to configure NameNode heap memory can be in two ways:
## a first embodiment Export HADOOP_NAMENODE_INIT_HEAPSIZE = "20480M" ## second embodiment export HADOOP_NAMENODE_OPTS = "- Xms20480M -Xmx20480M -Dhadoop.security.logger = $ {HADOOP_SECURITY_LOGGER: -INFO, RFAS} -Dhdfs.audit.logger = $ HDFS_AUDIT_LOGGER {: -info, NullAppender} $ HADOO P_NAMENODE_OPTS "
If we want to configure DataNode heap memory you can have the following two ways:
## The first way Export HADOOP_HEAPSIZE 2048M = ## The second approach, which will overwrite the first way above configuration export HADOOP_DATANODE_OPTS = "- Xms2048M -Xmx2048M -Dhadoop.security.logger = ERROR, RFAS $ HADOOP_DATANODE_OPTS "
If we want to configure Client heap memory can be as follows:
export HADOOP_CLIENT_OPTS="-Xmx1024m $HADOOP_CLIENT_OPTS"