Install Hadoop (Linux CentOS)
1. Preliminary environmental preparation
1.1, install gadgets
sudo yum install -y epel-release
sudo yum install -y psmisc nc net-tools rsync vim lrzsz ntp libzstd openssl-static
1.2, modify the host name
sudo hostnamectl set-hostname hadoop101
1.3, modify the host name mapping
sudo vim /etc/hosts
192.168.1.100 hadoop100
192.168.1.101 hadoop101
1.4, turn off the firewall
sudo systemctl stop firewalld
sudo systemctl disable firewalld
1.5, create atguigu user
sudo useradd atguigu
sudo passwd atguigu
1.6, restart
reboot
1.7, configure atguigu user to have root privileges
visudo
#修改/etc/sudoers文件,91行左右,在root下面添加
root ALL=(ALL) ALL
atguigu ALL=(ALL) NOPASSWD:ALL
1.8. Create a folder in the /opt directory
# 在/opt 目录下创建 module、software 文件夹
sudo mkdir /opt/module /opt/software
# 修改module、software 文件夹的所有者
sudo chown atguigu:atguigu /opt/module /opt/software
2. Install jdk
Details of installing jdk: https://blog.csdn.net/Asia1752/article/details/104505189
3. Install hadoop
sudo vim /etc/profile.d/my_env.sh
## Hadoop_home
export HADOOP_HOME=/opt/module/hadoop-3.1.3
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
检验:
hadoop version
hadoop checknative
4. Hadoop cluster configuration
4.0. List of files to be configured: cd etc/hadoop
core-site.xml
hdfs-site.xml
mapred-site.xml
yarn-site.xml
workers
4.1 core vi core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoop102:8020</value>
</property>
<property>
<name>hadoop.data.dir</name>
<value>/opt/module/hadoop-3.1.3/data</value>
</property>
<property>
<name>hadoop.proxyuser.atguigu.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.atguigu.groups</name>
<value>*</value>
</property>
</configuration>
4.2 、 vi hdfs-site.xml
<configuration>
<property>
<name>dfs.namenode.name.dir</name>
<value>file://${hadoop.data.dir}/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file://${hadoop.data.dir}/data</value>
</property>
<property>
<name>dfs.namenode.checkpoint.dir</name>
<value>file://${hadoop.data.dir}/namesecondary</value>
</property>
<property>
<name>dfs.client.datanode-restart.timeout</name>
<value>30</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>hadoop104:9868</value>
</property>
</configuration>
4.3 、 vi mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
4.4 、 vi yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hadoop103</value>
</property>
<property>
<name>yarn.nodemanager.env-whitelist</name>
<value>JAVA_HOME,HADOOP_COMMON_HOME,hADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
</property>
</configuration>
4.5 、 vi workers
# 配置所有的从机
hadoop102
hadoop103
hadoop104
4.6. Before cluster startup-format
hdfs namenode -format
4.7, start
start-dfs.sh
# 启动顺序:namenodes、datanodes、secondary namenodes
# 启动resourcemanager
start-yarn.sh
# 查看启动情况
jps
stop
stop-dfs.sh
stop-yarn.sh
5. Hadoop cluster expansion configuration
5.1, configure history server
It is recommended that the same node as the log aggregation
vi mapred-site.xml
<!-- 历史服务器端地址 -->
<property>
<name>mapreduce.jobhistory.address</name>
<value>hadoop102:10020</value>
</property>
<!-- 历史服务器web端地址 -->
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>hadoop102:19888</value>
</property>
Start history server
mapred --daemon start historyserver
5.2, configuration log aggregation
It is recommended to be the same node as the history server
vi yarn-site.xml
<!-- 开启日志聚集功能 -->
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<!-- 日志服务器 -->
<property>
<name>yarn.log.server.url</name>
<value>http://hadoop102:19888/jobhistory/logs</value>
</property>
<!-- 日志保持一星期 -->
<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>604800</value>
</property>