The configuration file corresponds to the function:
Modify the core-site.xml file
<property>
<name>fs.defaultFS</name>
<!-- Configure the address of the hdfs system -->
<value>hdfs://hdp-qm-01:8020</value> ( whichever machine is configured, the namenode will be started on which machine )
</property>
<property>
<name>io.file.buffer.size</name>
<value>4096</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/hadoop-2.6.0/hadoopdata/tmp</value>
</property>
Modify the hdfs-site.xml file
<property>
<!-- Number of copies 3-->
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<!--hadoop2.x default data block size is 128M-->
<name>dfs.blocksize</name>
<value>134217728</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<!--The location where the NameNode node stores metadata -->
<value>file:///home/hadoop/hadoop-2.6.0/hadoopdata/dfs/name</value>
</property>
<property>
<!--The location where the DataNode node stores the data block -->
<name>dfs.datanode.data.dir</name>
<value>file:///home/hadoop/hadoop-2.6.0/hadoopdata/dfs/data</value>
</property>
<property>
<name>fs.checkpoint.dir</name>
<value>file:///home/hadoop/hadoop-2.6.0/hadoopdata/checkpoint/dfs/cname</value>
</property>
<property>
<name>fs.checkpoint.edits.dir</name>
<value>file:///home/hadoop/hadoop-2.6.0/hadoopdata/checkpoint/dfs/cname</value>
</property>
<property>
<!--The web address of the hdfs system --> ( Host 2 acts as an auxiliary for host 1 )
<name>dfs.http.address</name>
<value>hdp-qm-01:50070</value>
</property>
<property>
<name>dfs.secondary.http.address</name>
<value>hdp-qm-02:50090</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
Modify the mapred-site.xml file
The command is as follows:
# mv mapred-site.xml.template mapred-site.xml
#vi mapred-site.xml
<property>
<!-- Configure the use of yarn resource scheduler when executing the calculation model -->
<name>mapreduce.framework.name</name>
<value>yarn</value>
<final>true</final>
</property>
<property>
<!-- Configure the address of the history service of the MapReduce framework --> Internal port
<name>mapreduce.jobhistory.address</name>
<value>hdp-qm-01:10020</value>
</property>
<property>
<!-- Configure the address of the history service of the MapReduce framework --> External web port
<name>mapreduce.jobhistory.webapp.address</name>
<value>hdp-qm-01:19888</value>19888 is the historical port of mapreduce
</property>
Modify yarn-site.xml to configure yarn cluster. . 50070 is the web port of hdfs
<property>
<!-- Configure the address of the resourcemanager service -->
<name>yarn.resourcemanager.hostname</name>
<value>hdp-qm-01</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<!-- Configure the address of mapreduce 's shuffle service -->
<value>mapreduce_shuffle</value> Note: The shuffle process starts when mapreduce processing starts and before aggregation .
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>hdp-qm-01:8032</value> The port for internal communication port to communicate with nodemanager
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name> requests a certain machine, the port required when scheduling resources,
<value>hdp-qm-01:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>hdp-qm-01:8031</value> ( tracker tracking, port required when nodemanager tracks resources )
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>hdp-qm-01:8033</value>Port required for task management
</property>
<property>
<!-- Configure the web access address of resourcemanager -->
<name>yarn.resourcemanager.webapp.address</name>
<value>hdp-qm-01:8088</value>8088 is the web port of yarn
</property>