Install Hadoop (Linux CentOS)

Install Hadoop (Linux CentOS)

1. Preliminary environmental preparation

1.1, install gadgets

sudo yum install -y epel-release
sudo yum install -y psmisc nc net-tools rsync vim lrzsz ntp libzstd openssl-static

1.2, modify the host name

sudo hostnamectl set-hostname hadoop101

1.3, modify the host name mapping

sudo vim /etc/hosts
192.168.1.100 hadoop100
192.168.1.101 hadoop101

1.4, turn off the firewall

sudo systemctl stop firewalld
sudo systemctl disable firewalld

1.5, create atguigu user

sudo useradd atguigu
sudo passwd atguigu

1.6, restart

reboot

1.7, configure atguigu user to have root privileges

visudo
#修改/etc/sudoers文件,91行左右,在root下面添加
root       ALL=(ALL)         ALL
atguigu      ALL=(ALL)         NOPASSWD:ALL

1.8. Create a folder in the /opt directory

# 在/opt 目录下创建 module、software 文件夹
sudo mkdir /opt/module /opt/software

# 修改module、software 文件夹的所有者
sudo chown atguigu:atguigu /opt/module /opt/software

2. Install jdk

Details of installing jdk: https://blog.csdn.net/Asia1752/article/details/104505189

3. Install hadoop

sudo vim /etc/profile.d/my_env.sh

## Hadoop_home
export HADOOP_HOME=/opt/module/hadoop-3.1.3
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin

检验:
hadoop version
hadoop checknative

4. Hadoop cluster configuration

4.0. List of files to be configured: cd etc/hadoop

core-site.xml
hdfs-site.xml
mapred-site.xml
yarn-site.xml
workers

4.1 core vi core-site.xml

<configuration>
	<property>
        <name>fs.defaultFS</name>
        <value>hdfs://hadoop102:8020</value>
    </property>
	<property>
        <name>hadoop.data.dir</name>
        <value>/opt/module/hadoop-3.1.3/data</value>
    </property>
	<property>
        <name>hadoop.proxyuser.atguigu.hosts</name>
        <value>*</value>
    </property>
	<property>
        <name>hadoop.proxyuser.atguigu.groups</name>
        <value>*</value>
    </property>
</configuration>

4.2 、 vi hdfs-site.xml

<configuration>
	<property>
        <name>dfs.namenode.name.dir</name>
        <value>file://${hadoop.data.dir}/name</value>
        </property>
    <property>
        <name>dfs.datanode.data.dir</name>
        <value>file://${hadoop.data.dir}/data</value>
    </property>
    <property>
        <name>dfs.namenode.checkpoint.dir</name>
        <value>file://${hadoop.data.dir}/namesecondary</value>
    </property>
	<property>
        <name>dfs.client.datanode-restart.timeout</name>
        <value>30</value>
    </property>
	<property>
        <name>dfs.namenode.secondary.http-address</name>
        <value>hadoop104:9868</value>
    </property>
</configuration>

4.3 、 vi mapred-site.xml

<configuration>
	<property>
		<name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>    
</configuration>

4.4 、 vi yarn-site.xml

<configuration>
	<property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    <property>
        <name>yarn.resourcemanager.hostname</name>
        <value>hadoop103</value>
    </property>
	<property>
        <name>yarn.nodemanager.env-whitelist</name>
        <value>JAVA_HOME,HADOOP_COMMON_HOME,hADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
    </property>
</configuration>

4.5 、 vi workers

# 配置所有的从机
hadoop102
hadoop103
hadoop104

4.6. Before cluster startup-format

hdfs namenode -format

4.7, start

start-dfs.sh
# 启动顺序:namenodes、datanodes、secondary namenodes

# 启动resourcemanager
start-yarn.sh

# 查看启动情况
jps

stop

stop-dfs.sh
stop-yarn.sh

5. Hadoop cluster expansion configuration

5.1, configure history server

It is recommended that the same node as the log aggregation
vi mapred-site.xml

<!-- 历史服务器端地址 -->
	<property>
		<name>mapreduce.jobhistory.address</name>
        <value>hadoop102:10020</value>
	</property>
	
	<!-- 历史服务器web端地址 -->
	<property>
		<name>mapreduce.jobhistory.webapp.address</name>
        <value>hadoop102:19888</value>
	</property>

Start history server

mapred --daemon start historyserver

5.2, configuration log aggregation

It is recommended to be the same node as the history server
vi yarn-site.xml

    <!-- 开启日志聚集功能 -->
	<property>
        <name>yarn.log-aggregation-enable</name>
        <value>true</value>
    </property>
	<!-- 日志服务器 -->
	<property>
        <name>yarn.log.server.url</name>
        <value>http://hadoop102:19888/jobhistory/logs</value>
    </property>
	<!-- 日志保持一星期 -->
	<property>
        <name>yarn.log-aggregation.retain-seconds</name>
        <value>604800</value>
    </property>

Guess you like

Origin blog.csdn.net/Asia1752/article/details/111871065