# Revision History: 20,200,112 filled pit yarn-site.xml and mapred-site.xml, problem-solving spark can not run on Hadoop, configuration and start jobHistoryServer
the article has already done the preparations environment. You can finally begin the installation HADOOP.
note! Here we want to switch back to the root user a
first step, download
find the version you want to install in this URL: http://www.apache.org/dyn/closer.cgi/hadoop/common
can select it recommended download mirrors, then choose a good download address.
Here I chose the 2.10.0 version:
$ curl -O https://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/hadoop-2.10.0/hadoop-2.10.0.tar.gz
Choose the largest one, probably more than 300 M
The second step, decompression
Hadoop as a service, we put it in the / srv directory is more appropriate, the following command:
# 解压
$ tar -xzf hadoop-2.10.0.tar.gz
# 转移
$ sudo mv hadoop-2.10.0 /srv/
# 把owner变成hadoop
$ sudo chown -R hadoop:hadoop /srv/hadoop-2.10.0
# 设置权限
$ sudo chmod g+w -R /srv/hadoop-2.10.0
# 创建一个symlink
$ sudo ln -s /srv/hadoop-2.10.0 /srv/hadoop
The third step is to configure the environment variable
Note here we are hadoop configuration of the user's environment variables , root user can give other users with environmental variables, so you can not switch users, of course, you can switch
$ sudo vim /home/hadoop/.bashrc
Add the following content to the user's environment variables hadoop
export HADOOP_HOME=/srv/hadoop
export PATH=$PATH:$HADOOP_HOME/bin
# 设置JAVA_HOME
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
Student and then set the user's environment variables, you can create a new file .bash_aliases
$ sudo vim /home/student/.bash_aliases
Add the following to the file which
export HADOOP_HOME=/srv/hadoop
export HADOOP_STREAMING=$HADOOP_HOME/share/hadoop/tools/lib/hadoop-streaming-2.10.0.jar
export PATH=$PATH:$HADOOP_HOME/bin
# 设置JAVA_HOME
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
# 有用的别名
alias ..="cd .."
alias ...="cd ../.."
alias hfs="hadoop fs"
alias hls="hsf -ls"
The configuration is: can restart
source /home/student/.bash_aliases
source /home/hadoop/.bashrc
Check whether the configuration is successful, run the command:
no errors on it
$ hadoop version
The fourth step hadoop configuration
- Edit hadoop-env.sh
$ sudo vim $HADOOP_HOME/etc/hadoop/hadoop-env.sh
Modify implemented in Java
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
- Edit core-site.xml
$ sudo vim $HADOOP_HOME/etc/hadoop/core-site.xml
Will be <configuration></configuration>
replaced by:
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000/</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/var/app/hadoop/data</value>
</property>
</configuration>`
- Edit mapred-site.xml
$ sudo cp $HADOOP_HOME/etc/hadoop/mapred-site.xml.template $HADOOP_HOME/etc/hadoop/mapred-site.xml
$ sudo vim $HADOOP_HOME/etc/hadoop/mapred-site.xml
Will be <configuration></configuration>
replaced by:
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
#add20200112 for historyserver & spark mapred
<property>
<name>mapreduce.jobhistory.address</name>
<value>localhost:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>localhost:19888</value>
</property>
<property>
<name>mapreduce.jobhistory.done-dir</name>
<value>${yarn.app.mapreduce.am.staging-dir}/history/done</value>
</property>
<property>
<name>mapreduce.jobhistory.intermediate-done-dir</name>
<value>${yarn.app.mapreduce.am.staging-dir}/history/done_intermediate</value>
</property>
<property>
<name>yarn.app.mapreduce.am.staging-dir</name>
<value>/tmp/hadoop-yarn/staging</value>
</property>
<property>
<name>mapreduce.map.memory.mb</name>
<value>1500</value>
<description>每个Map任务的物理内存限制</description>
</property>
<property>
<name>mapreduce.reduce.memory.mb</name>
<value>3000</value>
<description>每个Reduce任务的物理内存限制</description>
</property>
<property>
<name>mapreduce.map.java.opts</name>
<value>-Xmx1200m</value>
</property>
<property>
<name>mapreduce.reduce.java.opts</name>
<value>-Xmx2600m</value>
</property>
</configuration>
- Edit hdfs-site.xml
$ sudo vim $HADOOP_HOME/etc/hadoop/hdfs-site.xml
Will be <configuration></configuration>
replaced by:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>`
- Edit yarn-site.xml
$ sudo vim $HADOOP_HOME/etc/hadoop/hdfs-site.xml
Will be <configuration></configuration>
replaced by:
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>localhost:8030</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>localhost:8030</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>localhost:8050</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>localhost:8050</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>localhost:8050</value>
</property>
# add 20200112
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>22528</value>
<discription>每个节点可用内存,单位MB</discription>
</property>
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>1500</value>
<discription>单个任务可申请最少内存,默认1024MB</discription>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>16384</value>
<discription>单个任务可申请最大内存,默认8192MB</discription>
</property>
</configuration>`
So far, Hadoop is a distributed environment configuration has been completed.
The fifth step, formatting NameNode
to place the new file NameNode save files, and then initialize:
$ sudo mkdir -p /var/app/hadoop/data
$ sudo chown hadoop:hadoop -R /var/app/hadoop
$ sudo su hadoop
$ hadoop namenode -format
No error on it
The sixth step, start Hadoop
$ $HADOOP_HOME/sbin/start-dfs.sh
$ $HADOOP_HOME/sbin/start-yarn.sh
Then two daemons are started. If you encounter a problem on the importation y SSH
using jps command to view the running processes
$ jps
At this point you should see a list of processes:
(here if you do not see JPS, then follow the prompts to install a higher version of java, but the previous configuration files are not changed)
Jps
ResourceManager
SecondaryNameNode
NodeManager
NameNode
Hadoop cluster management page: http: // localhost: 8088
another management page: http: // localhost: 50070
Finally, prepare a space for the student account on HDFS:
$ hadoop fs -mkdir -p /user/student
$ hadoop fs -chown student:student /user/student
Here's a pseudo-distributed Hadoop environment build better. Next we want to put on top of the application.