Detailed explanation of hadoop2.0 configuration file

 

Go to: http://www.cnblogs.com/yinghun/p/6230436.html

Hadoop operating mode is divided into secure mode and non-secure mode. Here, I will describe the important parameter functions and functions of the main configuration files in non-secure mode. The Hadoop version used in this article is 2.6.4.

etc/hadoop/core-site.xml

parameter attribute value explain
fs.defaultFS NameNode URI hdfs://host:port/
io.file.buffer.size 131072 SequenceFiles file. Read and write cache size setting

 

<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://192.168.1.100:900</value>
        <description>192.168.1.100 is the server IP address, in fact, the host name can also be used</description>
    </property>
    <property>
        <name>io.file.buffer.size</name>
        <value>131072</value>
        <description>The unit of the attribute value is KB, 131072KB is the default 64M</description>
    </property>
</configuration>

 

 

etc/hadoop/hdfs-site.xml

  • Placement NameNode
parameter attribute value explain
dfs.namenode.name.dir Storage space and persistent processing logs on the NameNode where the local file system is located If this is a comma-separated list of directories, then the name table is copied to all directories in case it is needed.
dfs.namenode.hosts/
dfs.namenode.hosts.exclude
Datanodes permitted/excluded列表 If necessary, these files can be used to control the list of allowed data nodes
dfs.blocksize 268435456 Large file system HDFS block size is 256MB
dfs.namenode.handler.count 100 Set up more namenode threads to handle the high volume of RPC requests from the datanode

 

<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
        <description>Number of shards, configure it to 1 for pseudo-distribution</description>
    </property>
    <property>
        <name>dfs.namenode.name.dir</name>
        <value>file:/usr/local/hadoop/tmp/namenode</value>
        <description>Path where namespaces and transactions are permanently stored in the local file system</description>
    </property>
    <property>
        <name>dfs.namenode.hosts</name>
        <value>datanode1, datanode2</value>
        <description>datanode1, datanode2 respectively correspond to the host name of the server where the DataNode is located</description>
    </property>
    <property>
        <name>dfs.blocksize</name>
        <value>268435456</value>
        <description>The large file system HDFS block size is 256M, the default value is 64M</description>
    </property>
    <property>
        <name>dfs.namenode.handler.count</name>
        <value>100</value>
        <description>More NameNode server threads to handle RPCS from DataNodes</description>
    </property>
</configuration>

 

 

  • Placement DataNode
parameter attribute value explain
dfs.datanode.data.dir A comma-separated list of local filesystem paths on a DataNode where it should save its blocks If this is a comma-separated list of directories, then data will be stored in all named directories, usually on different devices.

 

<configuration>
    <property>
        <name>dfs.datanode.data.dir</name>
        <value>file:/usr/local/hadoop/tmp/datanode</value>
        <description>The path where the DataNode stores blocks in the local file system</description>
    </property>
</configuration>

 

 

etc/hadoop/yarn-site.xml

  • Configure ResourceManager and NodeManager:
parameter attribute value explain
yarn.resourcemanager.address 客户端对ResourceManager主机通过 host:port 提交作业 host:port
yarn.resourcemanager.scheduler.address ApplicationMasters 通过ResourceManager主机访问host:port跟踪调度程序获资源 host:port
yarn.resourcemanager.resource-tracker.address NodeManagers通过ResourceManager主机访问host:port host:port
yarn.resourcemanager.admin.address 管理命令通过ResourceManager主机访问host:port host:port
yarn.resourcemanager.webapp.address ResourceManager web页面host:port. host:port
yarn.resourcemanager.scheduler.class ResourceManager 调度类(Scheduler class) CapacityScheduler(推荐),FairScheduler(也推荐),orFifoScheduler
yarn.scheduler.minimum-allocation-mb 每个容器内存最低限额分配到的资源管理器要求 以MB为单位
yarn.scheduler.maximum-allocation-mb 资源管理器分配给每个容器的内存最大限制 以MB为单位
yarn.resourcemanager.nodes.include-path/
yarn.resourcemanager.nodes.exclude-path
NodeManagers的permitted/excluded列表 如有必要,可使用这些文件来控制允许NodeManagers列表

<configuration>

    <property>

        <name>yarn.resourcemanager.address</name>

        <value>192.168.1.100:8081</value>

        <description>IP地址192.168.1.100也可替换为主机名</description>

    </property>

    <property>

        <name>yarn.resourcemanager.scheduler.address</name>

        <value>192.168.1.100:8082</value>

        <description>IP地址192.168.1.100也可替换为主机名</description>

    </property>

    <property>

        <name>yarn.resourcemanager.resource-tracker.address</name>

        <value>192.168.1.100:8083</value>

        <description>IP地址192.168.1.100也可替换为主机名</description>

    </property>

    <property>

        <name>yarn.resourcemanager.admin.address</name>

        <value>192.168.1.100:8084</value>

        <description>IP地址192.168.1.100也可替换为主机名</description>

    </property>

    <property>

        <name>yarn.resourcemanager.webapp.address</name>

        <value>192.168.1.100:8085</value>

        <description>IP地址192.168.1.100也可替换为主机名</description>

    </property>

    <property>

        <name>yarn.resourcemanager.scheduler.class</name>

        <value>FairScheduler</value>

        <description>常用类:CapacityScheduler、FairScheduler、orFifoScheduler</description>

    </property>

    <property>

        <name>yarn.scheduler.minimum</name>

        <value>100</value>

        <description>单位:MB</description>

    </property>

    <property>

 

<configuration>
    <property>
        <name>yarn.nodemanager.resource.memory-mb</name>
        <value>256</value>
        <description>单位为MB</description>
    </property>
    <property>
        <name>yarn.nodemanager.vmem-pmem-ratio</name>
        <value>90</value>
        <description>百分比</description>
    </property>
    <property>
        <name>yarn.nodemanager.local-dirs</name>
        <value>/usr/local/hadoop/tmp/nodemanager</value>
        <description>列表用逗号分隔</description>
    </property>
    <property>
        <name>yarn.nodemanager.log-dirs</name>
        <value>/usr/local/hadoop/tmp/nodemanager/logs</value>
        <description>列表用逗号分隔</description>
    </property>
    <property>
        <name>yarn.nodemanager.log.retain-seconds</name>
        <value>10800</value>
        <description>单位为S</description>
    </property>
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce-shuffle</value>
        <description>Shuffle service 需要加以设置的MapReduce的应用程序服务</description>
    </property>
</configuration>

         <name>yarn.scheduler.maximum</name>

 

        <value>256</value>

        <description>单位:MB</description>

    </property>

    <property>

        <name>yarn.resourcemanager.nodes.include-path</name>

        <value>nodeManager1, nodeManager2</value>

        <description>nodeManager1, nodeManager2分别对应服务器主机名</description>

    </property>

</configuration>

  • 配置NodeManager
<configuration>
    <property>
        <name> mapreduce.framework.name</name>
        <value>yarn</value>
        <description>执行框架设置为Hadoop YARN</description>
    </property>
    <property>
        <name>mapreduce.map.memory.mb</name>
        <value>1536</value>
        <description>对maps更大的资源限制的</description>
    </property>
    <property>
        <name>mapreduce.map.java.opts</name>
        <value>-Xmx2014M</value>
        <description>maps中对jvm child设置更大的堆大小</description>
    </property>
    <property>
        <name>mapreduce.reduce.memory.mb</name>
        <value>3072</value>
        <description>设置 reduces对于较大的资源限制</description>
    </property>
    <property>
        <name>mapreduce.reduce.java.opts</name>
        <value>-Xmx2560M</value>
        <description>reduces对 jvm child设置更大的堆大小</description>
    </property>
    <property>
        <name>mapreduce.task.io.sort</name>
        <value>512</value>
        <description>更高的内存限制,而对数据进行排序的效率</description>
    </property>
    <property>
        <name>mapreduce.task.io.sort.factor</name>
        <value>100</value>
        <description>在文件排序中更多的流合并为一次</description>
    </property>
    <property>
        <name>mapreduce.reduce.shuffle.parallelcopies</name>
        <value>50</value>
        <description>通过reduces从很多的map中读取较多的平行副本</description>
    </property>
</configuration>
 
参数 属性值 解释
yarn.nodemanager.resource.memory-mb givenNodeManager即资源的可用物理内存,以MB为单位 定义在节点管理器总的可用资源,以提供给运行容器
yarn.nodemanager.vmem-pmem-ratio 最大比率为一些任务的虚拟内存使用量可能会超过物理内存率 每个任务的虚拟内存的使用可以通过这个比例超过了物理内存的限制。虚拟内存的使用上的节点管理器任务的总量可以通过这个比率超过其物理内存的使用
yarn.nodemanager.local-dirs 数据写入本地文件系统路径的列表用逗号分隔 多条存储路径可以提高磁盘的读写速度
yarn.nodemanager.log-dirs 本地文件系统日志路径的列表逗号分隔 多条存储路径可以提高磁盘的读写速度
yarn.nodemanager.log.retain-seconds 10800 如果日志聚合被禁用。默认的时间(以秒为单位)保留在节点管理器只适用日志文件
yarn.nodemanager.remote-app-log-dir logs HDFS目录下的应用程序日志移动应用上完成。需要设置相应的权限。仅适用日志聚合功能
yarn.nodemanager.remote-app-log-dir-suffix logs 后缀追加到远程日志目录。日志将被汇总到${yarn.nodemanager.remote­app­logdir}/${user}/${thisParam} 仅适用日志聚合功能。
yarn.nodemanager.aux-services mapreduce-shuffle Shuffle service 需要加以设置的Map Reduce的应用程序服务

etc/hadoop/mapred-site.xml

  • 配置mapreduce
参数 属性值 解释
mapreduce.framework.name yarn 执行框架设置为 Hadoop YARN.
mapreduce.map.memory.mb 1536 对maps更大的资源限制的.
mapreduce.map.java.opts -Xmx2014M maps中对jvm child设置更大的堆大小
mapreduce.reduce.memory.mb 3072 设置 reduces对于较大的资源限制
mapreduce.reduce.java.opts -Xmx2560M reduces对 jvm child设置更大的堆大小
mapreduce.task.io.sort.mb 512 更高的内存限制,而对数据进行排序的效率
mapreduce.task.io.sort.factor 100 在文件排序中更多的流合并为一次
mapreduce.reduce.shuffle.parallelcopies 50 通过reduces从很多的map中读取较多的平行 副本
  • 配置mapreduce的JobHistory服务器
参数 属性值 解释
maprecude.jobhistory.address MapReduce JobHistory Server host:port 默认端口号 10020
mapreduce.jobhistory.webapp.address MapReduce JobHistory Server Web UIhost:port 默认端口号 19888
mapreduce.jobhistory.intermediate-done-dir /mr­history/tmp 在历史文件被写入由MapReduce作业
mapreduce.jobhistory.done-dir /mr­history/done 目录中的历史文件是由MR JobHistory Server管理
<configuration>
    <property>
        <name> mapreduce.jobhistory.address</name>
        <value>192.168.1.100:10200</value>
        <description>IP地址192.168.1.100可替换为主机名</description>
    </property>
    <property>
        <name>mapreduce.jobhistory.webapp.address</name>
        <value>192.168.1.100:19888</value>
        <description>IP地址192.168.1.100可替换为主机名</description>
    </property>
    <property>
        <name>mapreduce.jobhistory.intermediate-done-dir</name>
        <value>/usr/local/hadoop/mr­history/tmp</value>
        <description>在历史文件被写入由MapReduce作业</description>
    </property>
    <property>
        <name>mapreduce.jobhistory.done-dir</name>
        <value>/usr/local/hadoop/mr­history/done</value>
        <description>目录中的历史文件是由MR JobHistoryServer管理</description>
    </property>
</configuration>

 

Web Interface

Daemon Web Interface Notes
NameNode http://nn_host:port/ 默认端口号50070
ResourceManager http://rm_host:port/ 默认端口号8088
MapReduce JobHistory Server http://jhs_host:port/ 默认端口号19888
 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326402150&siteId=291194637