Go to: http://www.cnblogs.com/yinghun/p/6230436.html
Hadoop operating mode is divided into secure mode and non-secure mode. Here, I will describe the important parameter functions and functions of the main configuration files in non-secure mode. The Hadoop version used in this article is 2.6.4.
etc/hadoop/core-site.xml
parameter | attribute value | explain |
fs.defaultFS | NameNode URI | hdfs://host:port/ |
io.file.buffer.size | 131072 | SequenceFiles file. Read and write cache size setting |
<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://192.168.1.100:900</value> <description>192.168.1.100 is the server IP address, in fact, the host name can also be used</description> </property> <property> <name>io.file.buffer.size</name> <value>131072</value> <description>The unit of the attribute value is KB, 131072KB is the default 64M</description> </property> </configuration>
etc/hadoop/hdfs-site.xml
- Placement NameNode
parameter | attribute value | explain |
dfs.namenode.name.dir | Storage space and persistent processing logs on the NameNode where the local file system is located | If this is a comma-separated list of directories, then the name table is copied to all directories in case it is needed. |
dfs.namenode.hosts/ dfs.namenode.hosts.exclude |
Datanodes permitted/excluded列表 | If necessary, these files can be used to control the list of allowed data nodes |
dfs.blocksize | 268435456 | Large file system HDFS block size is 256MB |
dfs.namenode.handler.count | 100 | Set up more namenode threads to handle the high volume of RPC requests from the datanode |
<configuration> <property> <name>dfs.replication</name> <value>1</value> <description>Number of shards, configure it to 1 for pseudo-distribution</description> </property> <property> <name>dfs.namenode.name.dir</name> <value>file:/usr/local/hadoop/tmp/namenode</value> <description>Path where namespaces and transactions are permanently stored in the local file system</description> </property> <property> <name>dfs.namenode.hosts</name> <value>datanode1, datanode2</value> <description>datanode1, datanode2 respectively correspond to the host name of the server where the DataNode is located</description> </property> <property> <name>dfs.blocksize</name> <value>268435456</value> <description>The large file system HDFS block size is 256M, the default value is 64M</description> </property> <property> <name>dfs.namenode.handler.count</name> <value>100</value> <description>More NameNode server threads to handle RPCS from DataNodes</description> </property> </configuration>
- Placement DataNode
parameter | attribute value | explain |
dfs.datanode.data.dir | A comma-separated list of local filesystem paths on a DataNode where it should save its blocks | If this is a comma-separated list of directories, then data will be stored in all named directories, usually on different devices. |
<configuration> <property> <name>dfs.datanode.data.dir</name> <value>file:/usr/local/hadoop/tmp/datanode</value> <description>The path where the DataNode stores blocks in the local file system</description> </property> </configuration>
etc/hadoop/yarn-site.xml
- Configure ResourceManager and NodeManager:
parameter | attribute value | explain |
yarn.resourcemanager.address | 客户端对ResourceManager主机通过 host:port 提交作业 | host:port |
yarn.resourcemanager.scheduler.address | ApplicationMasters 通过ResourceManager主机访问host:port跟踪调度程序获资源 | host:port |
yarn.resourcemanager.resource-tracker.address | NodeManagers通过ResourceManager主机访问host:port | host:port |
yarn.resourcemanager.admin.address | 管理命令通过ResourceManager主机访问host:port | host:port |
yarn.resourcemanager.webapp.address | ResourceManager web页面host:port. | host:port |
yarn.resourcemanager.scheduler.class | ResourceManager 调度类(Scheduler class) | CapacityScheduler(推荐),FairScheduler(也推荐),orFifoScheduler |
yarn.scheduler.minimum-allocation-mb | 每个容器内存最低限额分配到的资源管理器要求 | 以MB为单位 |
yarn.scheduler.maximum-allocation-mb | 资源管理器分配给每个容器的内存最大限制 | 以MB为单位 |
yarn.resourcemanager.nodes.include-path/ yarn.resourcemanager.nodes.exclude-path |
NodeManagers的permitted/excluded列表 | 如有必要,可使用这些文件来控制允许NodeManagers列表 |
<configuration>
<property>
<name>yarn.resourcemanager.address</name>
<value>192.168.1.100:8081</value>
<description>IP地址192.168.1.100也可替换为主机名</description>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>192.168.1.100:8082</value>
<description>IP地址192.168.1.100也可替换为主机名</description>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>192.168.1.100:8083</value>
<description>IP地址192.168.1.100也可替换为主机名</description>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>192.168.1.100:8084</value>
<description>IP地址192.168.1.100也可替换为主机名</description>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>192.168.1.100:8085</value>
<description>IP地址192.168.1.100也可替换为主机名</description>
</property>
<property>
<name>yarn.resourcemanager.scheduler.class</name>
<value>FairScheduler</value>
<description>常用类:CapacityScheduler、FairScheduler、orFifoScheduler</description>
</property>
<property>
<name>yarn.scheduler.minimum</name>
<value>100</value>
<description>单位:MB</description>
</property>
<property>
<configuration> <property> <name>yarn.nodemanager.resource.memory-mb</name> <value>256</value> <description>单位为MB</description> </property> <property> <name>yarn.nodemanager.vmem-pmem-ratio</name> <value>90</value> <description>百分比</description> </property> <property> <name>yarn.nodemanager.local-dirs</name> <value>/usr/local/hadoop/tmp/nodemanager</value> <description>列表用逗号分隔</description> </property> <property> <name>yarn.nodemanager.log-dirs</name> <value>/usr/local/hadoop/tmp/nodemanager/logs</value> <description>列表用逗号分隔</description> </property> <property> <name>yarn.nodemanager.log.retain-seconds</name> <value>10800</value> <description>单位为S</description> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce-shuffle</value> <description>Shuffle service 需要加以设置的MapReduce的应用程序服务</description> </property> </configuration>
<name>yarn.scheduler.maximum</name>
<value>256</value>
<description>单位:MB</description>
</property>
<property>
<name>yarn.resourcemanager.nodes.include-path</name>
<value>nodeManager1, nodeManager2</value>
<description>nodeManager1, nodeManager2分别对应服务器主机名</description>
</property>
</configuration>
- 配置NodeManager
<configuration> <property> <name> mapreduce.framework.name</name> <value>yarn</value> <description>执行框架设置为Hadoop YARN</description> </property> <property> <name>mapreduce.map.memory.mb</name> <value>1536</value> <description>对maps更大的资源限制的</description> </property> <property> <name>mapreduce.map.java.opts</name> <value>-Xmx2014M</value> <description>maps中对jvm child设置更大的堆大小</description> </property> <property> <name>mapreduce.reduce.memory.mb</name> <value>3072</value> <description>设置 reduces对于较大的资源限制</description> </property> <property> <name>mapreduce.reduce.java.opts</name> <value>-Xmx2560M</value> <description>reduces对 jvm child设置更大的堆大小</description> </property> <property> <name>mapreduce.task.io.sort</name> <value>512</value> <description>更高的内存限制,而对数据进行排序的效率</description> </property> <property> <name>mapreduce.task.io.sort.factor</name> <value>100</value> <description>在文件排序中更多的流合并为一次</description> </property> <property> <name>mapreduce.reduce.shuffle.parallelcopies</name> <value>50</value> <description>通过reduces从很多的map中读取较多的平行副本</description> </property> </configuration>
参数 | 属性值 | 解释 |
yarn.nodemanager.resource.memory-mb | givenNodeManager即资源的可用物理内存,以MB为单位 | 定义在节点管理器总的可用资源,以提供给运行容器 |
yarn.nodemanager.vmem-pmem-ratio | 最大比率为一些任务的虚拟内存使用量可能会超过物理内存率 | 每个任务的虚拟内存的使用可以通过这个比例超过了物理内存的限制。虚拟内存的使用上的节点管理器任务的总量可以通过这个比率超过其物理内存的使用 |
yarn.nodemanager.local-dirs | 数据写入本地文件系统路径的列表用逗号分隔 | 多条存储路径可以提高磁盘的读写速度 |
yarn.nodemanager.log-dirs | 本地文件系统日志路径的列表逗号分隔 | 多条存储路径可以提高磁盘的读写速度 |
yarn.nodemanager.log.retain-seconds | 10800 | 如果日志聚合被禁用。默认的时间(以秒为单位)保留在节点管理器只适用日志文件 |
yarn.nodemanager.remote-app-log-dir | logs | HDFS目录下的应用程序日志移动应用上完成。需要设置相应的权限。仅适用日志聚合功能 |
yarn.nodemanager.remote-app-log-dir-suffix | logs | 后缀追加到远程日志目录。日志将被汇总到${yarn.nodemanager.remoteapplogdir}/${user}/${thisParam} 仅适用日志聚合功能。 |
yarn.nodemanager.aux-services | mapreduce-shuffle | Shuffle service 需要加以设置的Map Reduce的应用程序服务 |
etc/hadoop/mapred-site.xml
- 配置mapreduce
参数 | 属性值 | 解释 |
mapreduce.framework.name | yarn | 执行框架设置为 Hadoop YARN. |
mapreduce.map.memory.mb | 1536 | 对maps更大的资源限制的. |
mapreduce.map.java.opts | -Xmx2014M | maps中对jvm child设置更大的堆大小 |
mapreduce.reduce.memory.mb | 3072 | 设置 reduces对于较大的资源限制 |
mapreduce.reduce.java.opts | -Xmx2560M | reduces对 jvm child设置更大的堆大小 |
mapreduce.task.io.sort.mb | 512 | 更高的内存限制,而对数据进行排序的效率 |
mapreduce.task.io.sort.factor | 100 | 在文件排序中更多的流合并为一次 |
mapreduce.reduce.shuffle.parallelcopies | 50 | 通过reduces从很多的map中读取较多的平行 副本 |
- 配置mapreduce的JobHistory服务器
参数 | 属性值 | 解释 |
maprecude.jobhistory.address | MapReduce JobHistory Server host:port | 默认端口号 10020 |
mapreduce.jobhistory.webapp.address | MapReduce JobHistory Server Web UIhost:port | 默认端口号 19888 |
mapreduce.jobhistory.intermediate-done-dir | /mrhistory/tmp | 在历史文件被写入由MapReduce作业 |
mapreduce.jobhistory.done-dir | /mrhistory/done | 目录中的历史文件是由MR JobHistory Server管理 |
<configuration> <property> <name> mapreduce.jobhistory.address</name> <value>192.168.1.100:10200</value> <description>IP地址192.168.1.100可替换为主机名</description> </property> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>192.168.1.100:19888</value> <description>IP地址192.168.1.100可替换为主机名</description> </property> <property> <name>mapreduce.jobhistory.intermediate-done-dir</name> <value>/usr/local/hadoop/mrhistory/tmp</value> <description>在历史文件被写入由MapReduce作业</description> </property> <property> <name>mapreduce.jobhistory.done-dir</name> <value>/usr/local/hadoop/mrhistory/done</value> <description>目录中的历史文件是由MR JobHistoryServer管理</description> </property> </configuration>
Web Interface
Daemon | Web Interface | Notes |
NameNode | http://nn_host:port/ | 默认端口号50070 |
ResourceManager | http://rm_host:port/ | 默认端口号8088 |
MapReduce JobHistory Server | http://jhs_host:port/ | 默认端口号19888 |