Hadoop3.x configuration

Hadoop3.x configuration file

First, let’s understand what nodes are about to be configured.
Insert image description here

Node configuration performed during development below
Insert image description here

The explanation of the following content can be found at the bottom of the official document.
Insert image description here
The second column is the value assigned to you by default if it is not matched, and the third column is the explanation.
Insert image description here

Configure cluster association for hdfs


Insert the following content into the /liu/hadoop/etc/hadoop/core-site.xml node of the host.

<!-- 决定hdfs运行模式,这里填上自己的ip,就变为集群模式,填上的是NameNode的地址 -->
<property>
        <name>fs.defaultFS</name>
        <value>hdfs://liu:8020</value> <!--主节点NameNode的地址(datanode不需要配置端口)-->
        <!-- liu这个主机名字必须是 在主机的hosts文件中已经标注过的-->
</property>

<!-- 配置hadoop临时目录,存储元数据用的-->
<property>
        <name>hadoop.tmp.dir</name>
        <value>/liu/hadoop/linshi</value>
</property>

<!-- 配置web端页面的静态用户 -->
<property>
        <name>hadoop.http.staticuser.user</name>
        <value>root</value>
        <!-- 管理员的名字,必须是 主机 存在的用户(不是副机),并且该用户拥有root权限才行(这里直接用了root用户,所以具有所有权限,就不管了) -->
</property>

<!--定义HDFS所开放的代理服务 给hive用 -->
<property>                  <!--↓这里的名字要和前面的staticuser的名字一样 -->
        <name>hadoop.proxyuser.root.hosts</name>
        <value>*</value>
</property>

<property><!--                  ↓这里的名字要和前面的staticuser的名字一样 -->
        <name>hadoop.proxyuser.root.groups</name><!--给所有组(里面包含用户)分配 权限-->
        <value>*</value>
</property>

Configure the web console associated with the node


Insert the following content into the /liu/hadoop/etc/hadoop/hdfs-site.xml node on the host.

<!-- 配置  主结点的web控制台地址-->
<property>
    <name>dfs.namenode.http-address</name>
    <value>liu:9870</value>
</property>

<!-- 配置  从结点的web控制台地址-->
<property>
    <name>dfs.namenode.secondary.http-address</name>
    <value>liu2:9868</value>
</property>

After finishing, start the NameNode and DataNode, turn off the firewall, and access the NameNode web console directly from the public IP: 9870.

Configure mapreduce cluster association


Insert the following content into the node of /liu/hadoop/etc/hadoop/mapred-site.xml on the host.

<property> <!-- 把mapred的工作区间放在yarn集群上跑-->
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
</property>

<!-- 历史服务器运行机器以及端口 -->
<property>
    <name>mapreduce.jobhistory.address</name>
    <value>liu2:10020</value>
</property>

<!-- 历史服务器web端地址 -->
<property>
    <name>mapreduce.jobhistory.webapp.address</name>
    <value>liu2:19888</value>
</property>

Configure yarn cluster association


Insert the following content into the /liu/hadoop/etc/hadoop/yarn-site.xml node of the host.

<!-- 指定MR走shuffle -->
<property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
</property>

<!-- 指定ResourceManager的地址-->
 <property>
     <name>yarn.resourcemanager.hostname</name>
     <value>liu2</value>
</property>

<!-- 环境变量的继承,给container用的-->
<property>
        <name>yarn.nodemanager.env-whitelist</name>
	<value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
</property>

<!-- yarn容器允许分配的最大最小内存 -->
<property><!-- 执行一个mr的job时 的最大最小空间 -->
    <name>yarn.scheduler.minimum-allocation-mb</name>
    <value>1024</value>
</property>
<property>
    <name>yarn.scheduler.maximum-allocation-mb</name>
    <value>4096</value>
</property>
<property><!-- NodeManager给ResourceManager 能给的最多内存-->
    <name>yarn.nodemanager.resource.memory-mb</name>
    <value>4096</value>
</property>
<property> <!--            physical物理内存的使用检查,如果为true,那么一旦超过前面的最大值,就会直接杀死该进程-->
    <name>yarn.nodemanager.pmem-check-enabled</name>
    <value>false</value>
</property>
<property><!--            virtual物理内存的使用检查-->
    <name>yarn.nodemanager.vmem-check-enabled</name>
    <value>false</value>
</property>
<property>
	<name>yarn.log-aggregation-enable</name>
	<value>true</value> <!-- true就是 启动日志服务器 -->
</property>

<!-- 设置日志聚集服务器地址 -->
<property>
    <name>yarn.log.server.url</name>
    <value>http://liu2:19888/jobhistory/logs</value>
</property>

<!-- 设置日志保留时间为7天 -->
<property>
    <name>yarn.log-aggregation.retain-seconds</name>
    <value>604800</value>
</property>

Guess you like

Origin blog.csdn.net/web13618542420/article/details/126665147