Configuration files of Hadoop and Yarn

  The principle of cluster parameter configuration is to rewrite the configuration and override by default, otherwise it will take effect by default. The following summarizes the commonly used configuration file parameters of Haoop. Commonly used configuration files: core-site.xml, hdfs-site.xml, mapred-site.xml, yarn-site.xml, configured in the two instances of Hadoop and Yarn, one of the two components of Hadoop and Yarn is responsible for storing one It is a resource management framework, which is equivalent to computing and storage. Some companies have separate computing nodes and storage nodes, and some do not use them as required.
  
  1. core-site.xml is the core configuration file of the NameNode. It mainly sets the properties of the NameNode, and it only takes effect on the NameNode node.
   

<configuration>
<property>
    <name>fs.defaultFS</name>
    <value>hdfs://HadoopHhy</value>
</property>
<property>
    <name>ha.zookeeper.quorum</name>
    <value>zk1:2015,zk2:2015,zk3:2015</value>
</property>
<property>
    <name>hadoop.tmp.dir</name>
    <value>/home/bigdata/hadoop/tmp</value>
    <final>true</final>
</property>

<property>
    <name>io.file.buffer.size</name>
    <value>131072</value>
    <final>true</final>
</property>
<property>
    <name>fs.trash.interval</name>
    <value>1440</value>
</property>

<property>
        <name>hadoop.security.authorization</name>
        <value>true</value>
</property>

</configuration>

   
 Parameter configuration and explanation:

Configuration files of Hadoop and Yarn

2. The hdfs-site.xml file

is the core configuration file of HDFS. It mainly configures some HDFS-based attribute information of NameNode and DataNode, and it takes effect on NameNode and DataNode.

<configuration>
<property>
    <name>dfs.nameservices</name>
    <value>test</value>
</property>

<property>
    <name>dfs.ha.namenodes.test</name>
    <value>nn1,nn2</value>
</property>

<property>
    <name>dfs.namenode.rpc-address.test.nn1</name>
    <value>host1:9000</value>
</property>

<property>
    <name>dfs.namenode.rpc-address.test.nn2</name>
    <value>host2:9000</value>
</property>
<property>
    <name>dfs.namenode.http-address.test.nn1</name>
    <value>host1:50070</value>
</property>
<property>
    <name>dfs.namenode.http-address.test.nn2</name>
    <value>host2:50070</value>
</property>
<property>
    <name>dfs.namenode.shared.edits.dir</name>
    <value>qjournal://host1:8485;host2:8485;host3:8485/test</value>
</property>
<property>
    <name>dfs.client.failover.proxy.provider.test</name>
    <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>

<property>
    <name>dfs.ha.fencing.methods</name>
    <value>shell(/bin/true)</value>
</property>
<property>
    <name>dfs.journalnode.edits.dir</name>
    <value>/data/journal</value>
</property>
<property>
    <name>dfs.ha.automatic-failover.enabled</name>
    <value>true</value>
</property>
<property>
    <name>dfs.block.size</name>
    <value>134217728</value>
    <final>true</final>
</property>

<property>
    <name>dfs.replication</name>
    <value>3</value>
</property>
<property>
    <name>dfs.name.dir</name>
    <value>/data/namenode</value>
</property>
<property>
    <name>dfs.data.dir</name>
    <value>/data/datanode</value>
    <final>true</final>
</property>

<property>
    <name>dfs.permissions.enabled</name>
    <value>true</value>
</property>

<property>
    <name>dfs.namenode.acls.enabled</name>
    <value>true</value>
</property>
<property>
        <name>dfs.image.transfer.bandwidthPerSec</name>
        <value>314572800</value>
</property>
<property>
        <name>dfs.image.transfer.timeout</name>
        <value>120000</value>
</property>
<property>
        <name>dfs.namenode.checkpoint.txns</name>
        <value>5000000</value>
</property>

        <name>dfs.namenode.edits.dir</name>
        <value>/data/editlog</value>
</property>

<property>
        <name>dfs.hosts.exclude</name>
        <value>/etc/hadoop/hosts-exclude</value>
</property>
<property>
        <name>dfs.datanode.balance.bandwidthPerSec</name>
        <value>20971520</value>
</property>
<property>
        <name>dfs.namenode.accesstime.precision</name>
        <value>0</value>
</property>
<property>
        <name>dfs.namenode.decommission.interval</name>
        <value>30</value>
</property>
</configuration>

 Parameter configuration and explanation:

Configuration files of Hadoop and Yarn


3 yarn-site.xml file

This file is the core configuration file of the Yarn resource management framework. All configurations of Yarn are set in this file.


 Parameter configuration and explanation:

<configuration>

    <property>
        <name>yarn.resourcemanager.ha.enabled</name>
        <value>true</value>
    </property>

    <property>
            <name>yarn.resourcemanager.cluster-id</name>
        <value>xxx</value>
        </property>

    <property>
        <name>yarn.resourcemanager.ha.rm-ids</name>
        <value>rm1,rm2</value>
    </property>

    <property>
        <name>yarn.resourcemanager.hostname.rm1</name>
        <value>host1</value>
    </property>

    <property>
        <name>yarn.resourcemanager.hostname.rm2</name>
        <value>host2</value>
    </property>

    <property>
        <name>yarn.resourcemanager.zk-address</name>
        <value>host1:2015,host2:2015,host3:2015</value>
    </property>

    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle,spark_shuffle</value>
    </property>
    <property>
        <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
        <value>org.apache.hadoop.mapred.ShuffleHandler</value>
    </property>
        <property>
             <name>yarn.nodemanager.aux-services.spark_shuffle.class</name>
             <value>org.apache.spark.network.yarn.YarnShuffleService</value>
        </property>

    <property>
        <name>yarn.nodemanager.local-dirs</name>
        <value>/data/yarn</value>
    </property>

    <property>
        <name>yarn.nodemanager.log-dirs</name>
        <value>/data/logs</value>
    </property>

    <property>
        <name>yarn.resourcemanager.address.rm1</name>
        <value>host1:port</value>
    </property>

    <property>
        <name>yarn.resourcemanager.scheduler.address.rm1</name>
        <value>host1:port</value>
        <final>true</final>
    </property>

    <property>
        <name>yarn.resourcemanager.webapp.address.rm1</name>
        <value>host1:8088</value>
    </property>

    <property>
        <name>yarn.resourcemanager.resource-tracker.address.rm1</name>
        <value>host1:port</value>
    </property>

    <property>
        <name>yarn.resourcemanager.admin.address.rm1</name>
        <value>host1:port</value>
    </property>

    <property>
        <name>yarn.resourcemanager.ha.admin.address.rm1</name>
        <value>host1:port</value>
    </property>

    <property>
        <name>yarn.resourcemanager.address.rm2</name>
        <value>host2:port</value>
    </property>

    <property>
        <name>yarn.resourcemanager.scheduler.address.rm2</name>
        <value>host2:port</value>
        <final>true</final>
    </property>

    <property>
        <name>yarn.resourcemanager.webapp.address.rm2</name>
        <value>host2:8088</value>
    </property>

    <property>
        <name>yarn.resourcemanager.resource-tracker.address.rm2</name>
        <value>host2:port</value>
    </property>

    <property>
        <name>yarn.resourcemanager.admin.address.rm2</name>
        <value>host2:port</value>
    </property>

    <property>
        <name>yarn.resourcemanager.ha.admin.address.rm2</name>
        <value>host2:port</value>
    </property>

    <property> 
        <name>yarn.client.failover-proxy-provider</name> 
         <value>org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider</value> 
    </property>

        <property>
                <name>yarn.resourcemanager.recovery.enabled</name>
                <value>true</value>
        </property>

    <property>
            <name>yarn.resourcemanager.store.class</name>
            <value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
        </property>

        <property>  
                <name>yarn.log-aggregation-enable</name>  
                <value>false</value>  
        </property>

        <property>
                <name>yarn.log-aggregation.retain-seconds</name>
                <value>172800</value>
        </property>

        <property>
                <name>yarn.log-aggregation.retain-check-interval-seconds</name>
                <value>21600</value>
        </property>

        <property>
                <name>yarn.scheduler.minimum-allocation-mb</name>
                <value>2048</value>
        </property>

        <property>
                <name>yarn.scheduler.maximum-allocation-mb</name>
                <value>24576</value>
        </property>
        <property>
                <name>yarn.log.server.url</name>
                <value>http://host2:port/jobhistory/logs</value>
        </property>
        <property>
                <name>yarn.nodemanager.resource.memory-mb</name>
                <value>25600</value>
        </property>
        <property>
                <name>yarn.nodemanager.resource.cpu-vcores</name>
                <value>6</value>
        </property>
        <property>
                <name>yarn.resourcemanager.nodemanager-connect-retries</name>
                <value>10</value>
        </property>
</configuration>

Configuration files of Hadoop and Yarn

  1. The mapred-site.xml file

    MR configuration file.
    JobHistory is used to record the complete information of MapReduce tasks in the HDFS directory.

     Parameter configuration and explanation:

    Configuration files of Hadoop and Yarn

Guess you like

Origin blog.51cto.com/13836096/2532831