Configuration Ubuntu Hadoop - (B) mounting configuration Hadoop

# Revision History: 20,200,112 filled pit yarn-site.xml and mapred-site.xml, problem-solving spark can not run on Hadoop, configuration and start jobHistoryServer
the article has already done the preparations environment. You can finally begin the installation HADOOP.
note! Here we want to switch back to the root user a
first step, download
find the version you want to install in this URL: http://www.apache.org/dyn/closer.cgi/hadoop/common
can select it recommended download mirrors, then choose a good download address.
Here Insert Picture DescriptionHere I chose the 2.10.0 version:

$ curl -O https://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/hadoop-2.10.0/hadoop-2.10.0.tar.gz

Choose the largest one, probably more than 300 M

The second step, decompression
Hadoop as a service, we put it in the / srv directory is more appropriate, the following command:

# 解压
$ tar -xzf hadoop-2.10.0.tar.gz
# 转移
$ sudo mv hadoop-2.10.0 /srv/
# 把owner变成hadoop
$ sudo chown -R hadoop:hadoop /srv/hadoop-2.10.0
# 设置权限
$ sudo chmod g+w -R /srv/hadoop-2.10.0
# 创建一个symlink
$ sudo ln -s /srv/hadoop-2.10.0 /srv/hadoop

The third step is to configure the environment variable
Note here we are hadoop configuration of the user's environment variables , root user can give other users with environmental variables, so you can not switch users, of course, you can switch

$ sudo vim /home/hadoop/.bashrc

Add the following content to the user's environment variables hadoop

export HADOOP_HOME=/srv/hadoop
export PATH=$PATH:$HADOOP_HOME/bin

# 设置JAVA_HOME
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64

Student and then set the user's environment variables, you can create a new file .bash_aliases

$ sudo vim /home/student/.bash_aliases

Add the following to the file which

export HADOOP_HOME=/srv/hadoop
export HADOOP_STREAMING=$HADOOP_HOME/share/hadoop/tools/lib/hadoop-streaming-2.10.0.jar
export PATH=$PATH:$HADOOP_HOME/bin

# 设置JAVA_HOME
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64

# 有用的别名
alias ..="cd .."
alias ...="cd ../.."
alias hfs="hadoop fs"
alias hls="hsf -ls"

The configuration is: can restart

source /home/student/.bash_aliases
source /home/hadoop/.bashrc

Check whether the configuration is successful, run the command:
no errors on it

$ hadoop version

The fourth step hadoop configuration

  1. Edit hadoop-env.sh
$ sudo vim $HADOOP_HOME/etc/hadoop/hadoop-env.sh

Modify implemented in Java

export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
  1. Edit core-site.xml
$ sudo vim $HADOOP_HOME/etc/hadoop/core-site.xml

Will be <configuration></configuration>replaced by:

<configuration>
	<property>
		<name>fs.default.name</name>
		<value>hdfs://localhost:9000/</value>
	</property>
	<property>
		<name>hadoop.tmp.dir</name>
		<value>/var/app/hadoop/data</value>
	</property>
</configuration>`
  1. Edit mapred-site.xml
$ sudo cp $HADOOP_HOME/etc/hadoop/mapred-site.xml.template $HADOOP_HOME/etc/hadoop/mapred-site.xml
$ sudo vim $HADOOP_HOME/etc/hadoop/mapred-site.xml

Will be <configuration></configuration>replaced by:

<configuration>
        <property>
                <name>mapreduce.framework.name</name>
                <value>yarn</value>
        </property>
        #add20200112 for historyserver & spark mapred
        <property>
                <name>mapreduce.jobhistory.address</name>
                <value>localhost:10020</value>
        </property>
        <property>
                <name>mapreduce.jobhistory.webapp.address</name>
                <value>localhost:19888</value>
        </property>
        <property>
                <name>mapreduce.jobhistory.done-dir</name>
                <value>${yarn.app.mapreduce.am.staging-dir}/history/done</value>
        </property>
        <property>
                <name>mapreduce.jobhistory.intermediate-done-dir</name>
                <value>${yarn.app.mapreduce.am.staging-dir}/history/done_intermediate</value>
        </property>
        <property>
                <name>yarn.app.mapreduce.am.staging-dir</name>
                <value>/tmp/hadoop-yarn/staging</value>
        </property>
        <property>
                <name>mapreduce.map.memory.mb</name>
                <value>1500</value>
                <description>每个Map任务的物理内存限制</description>
        </property>

        <property>
                <name>mapreduce.reduce.memory.mb</name>
                <value>3000</value>
                <description>每个Reduce任务的物理内存限制</description>
        </property>

        <property>
                <name>mapreduce.map.java.opts</name>
                <value>-Xmx1200m</value>
        </property>

        <property>
                <name>mapreduce.reduce.java.opts</name>
                <value>-Xmx2600m</value>
        </property>
</configuration>



  1. Edit hdfs-site.xml
$ sudo vim $HADOOP_HOME/etc/hadoop/hdfs-site.xml

Will be <configuration></configuration>replaced by:

<configuration>
	<property>
		<name>dfs.replication</name>
		<value>1</value>
	</property>
</configuration>`
  1. Edit yarn-site.xml
$ sudo vim $HADOOP_HOME/etc/hadoop/hdfs-site.xml

Will be <configuration></configuration>replaced by:

<configuration>
	<property>
		<name>yarn.nodemanager.aux-services</name>
		<value>mapreduce_shuffle</value>
	</property>
	<property>
		<name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
		<value>org.apache.hadoop.mapred.ShuffleHandler</value>
	</property>
	<property>
		<name>yarn.resourcemanager.resource-tracker.address</name>
		<value>localhost:8030</value>
	</property>
	<property>
		<name>yarn.resourcemanager.scheduler.address</name>
		<value>localhost:8030</value>
	</property>
	<property>
		<name>yarn.resourcemanager.address</name>
		<value>localhost:8050</value>
	</property>
	<property>
		<name>yarn.resourcemanager.address</name>
		<value>localhost:8050</value>
	</property>
	<property>
		<name>yarn.resourcemanager.address</name>
		<value>localhost:8050</value>
	</property>
	# add 20200112
	<property>
        <name>yarn.nodemanager.resource.memory-mb</name>
        <value>22528</value>
        <discription>每个节点可用内存,单位MB</discription>
    </property>
 
    <property>
        <name>yarn.scheduler.minimum-allocation-mb</name>
        <value>1500</value>
        <discription>单个任务可申请最少内存,默认1024MB</discription>
    </property>
 
    <property>
        <name>yarn.scheduler.maximum-allocation-mb</name>
        <value>16384</value>
        <discription>单个任务可申请最大内存,默认8192MB</discription>
    </property>
</configuration>`

So far, Hadoop is a distributed environment configuration has been completed.

The fifth step, formatting NameNode
to place the new file NameNode save files, and then initialize:

$ sudo mkdir -p /var/app/hadoop/data
$ sudo chown hadoop:hadoop -R /var/app/hadoop
$ sudo su hadoop
$ hadoop namenode -format

No error on it

The sixth step, start Hadoop

$ $HADOOP_HOME/sbin/start-dfs.sh
$ $HADOOP_HOME/sbin/start-yarn.sh

Then two daemons are started. If you encounter a problem on the importation y SSH
using jps command to view the running processes

$ jps

At this point you should see a list of processes:
(here if you do not see JPS, then follow the prompts to install a higher version of java, but the previous configuration files are not changed)

Jps
ResourceManager
SecondaryNameNode
NodeManager
NameNode

Hadoop cluster management page: http: // localhost: 8088
another management page: http: // localhost: 50070

Finally, prepare a space for the student account on HDFS:

$ hadoop fs -mkdir -p /user/student
$ hadoop fs -chown student:student /user/student

Here's a pseudo-distributed Hadoop environment build better. Next we want to put on top of the application.

Published 78 original articles · won praise 7 · views 10000 +

Guess you like

Origin blog.csdn.net/yao09605/article/details/103916116