Linux builds Hadoop development environment

 Linux builds Hadoop development environment
Hadoop environment setup and installation configuration:
[1]. Download the Hadoop-2.7.5 installation package from the official website: hadoop-2.7.5/hadoop-2.7.5.tar.gz
[2]. Upload the Hadoop-2.7.5 installation package to: /usr/local/hadoop using the Xftp5 tool
[3]. Log in to the Liunx server and use Xhell5 to enter: cd /usr/local/hadoop:
[root@marklin hadoop]# cd /usr/local/hadoop
       [root@marklin hadoop]#
       And use tar -xvf to decompress: tar -xvf hadoop-2.7.5.tar.gz,
[root @ marklin hadoop] # tar -xvf hadoop-2.7.5.tar.gz
[4]. Configure Hadoop environment variables, enter: vim /etc/profile
     #Setting HADOOP_HOME PATH
    export HADOOP_HOME=/usr/local/hadoop/hadoop-2.7.5
    export PATH=${PATH}:${HADOOP_HOME}/bin:${HADOOP_HOME}/sbin::${HADOOP_HOME}/lib
    export HADOOP_COMMON_LIB_NATIVE_DIR=${HADOOP_HOME}/lib/native
    export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"
    export HADOOP_MAPARED_HOME=${HADOOP_HOME}
    export HADOOP_COMMON_HOME=${HADOOP_HOME}
    export HADOOP_HDFS_HOME=${HADOOP_HOME}
    export YARN_HOME=${HADOOP_HOME}
    export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop
    export HDFS_CONF_DIR=${HADOOP_HOME}/etc/hadoop
    export YARN_CONF_DIR=${HADOOP_HOME}/etc/hadoop
Save the configuration, enter: source /etc/profile
[root@marklin ~]# source /etc/profile
       [root@marklin ~]#
 PS: The two most important points are:
[1] Modify the host name: vim /etc/hostname 
[2] Modify the mapping between the configuration host and the Ip address: vim /etc/hosts 
[5].Hadoop modify the configuration file:
core-site.xml: Hadoop core configuration, including tmp temporary configuration file and access address, default port 9000
mapred-site.xml: Configuration processing of related data processing models in Hadoop
yarn-site.xml: Configuration processing of related jobs in Hadoop
hdfs-site.xml: Hadoop configuration file backup number and data folder configuration
 
(1). Configure core-site.xml, enter: vim core-site.xml in the Hadoop installation directory [/usr/local/hadoop/hadoop-2.7.5/etc/hadoop]
[root@marklin ~]# cd /usr/local/hadoop/hadoop-2.7.5/etc/hadoop
 
[root@marklin hadoop]#
Input: vim core-site.xml
and configure:
<configuration>
    <property>
        <name>fs.default.name</name>
        <value> hdfs://marklin.com:9000 </value>
    </property>
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/usr/local/hadoop/repository/hdfs/tmp</value>
    </property>
    <property>
        <name>io.file.buffer.size</name>
        <value>131702</value>
    </property>
    <property>  
       <name>hadoop.proxyuser.hadoop.hosts</name>  
       <value>*</value>  
    </property>  
    <property>  
       <name>hadoop.proxyuser.hadoop.groups</name>  
       <value>*</value>  
    </property>  
</configuration>
 
At the same time, in the file path: /usr/local/hadoop/repository/hdfs, create a tmp directory: mkdir tmp
(2) Modify hdfs-site.xml and configure: vim hdfs-site.xml
[root@marklin hadoop]# vim hdfs-site.xml
[root@marklin hadoop]# 
<configuration>
    <property>
        <name>dfs.namenode.name.dir</name> --dfs.namenode.name.dir defines the namenode path
        <value>/usr/local/hadoop/repository/hdfs/name</value>
        <final>true</final>
    </property>
    <property>
        <name>dfs.datanode.data.dir</name> --dfs.datanode.data.dir defines the data node path
        <value>/usr/local/hadoop/repository/hdfs/data</value>
        <final>true</final>
    </property>
    <property>
        <name>dfs.permissions</name> --dfs.permissions defines permission authentication
        <value>false</value>
    </property>
    <property>
        <name>dfs.replication</name>--dfs.replication defines the number of copies of the file
        <value>1</value>
    </property>
   <property>
        <name>dfs.namenode.http-address</name>--dfs.namenode.http-address defines service http access
        <value>marklin.com:50070</value>
    </property>
    <property>
        <name>dfs.namenode.secondary.http-address</name>--dfs.namenode.secondary.http-address defines service http access
        <value>marklin.com:50090</value>
    </property>
    <property>
        <name>dfs.webhdfs.enabled</name>
        <value>true</value>
    </property>
</configuration>
<configuration>
    <property>
        <name>dfs.namenode.name.dir</name>
        <value>/usr/local/hadoop/repository/hdfs/name</value>
        <final>true</final>
    </property>
    <property>
        <name>dfs.datanode.data.dir</name>
        <value>/usr/local/hadoop/repository/hdfs/data</value>
        <final>true</final>
    </property>
    <property>
        <name>dfs.replication</name>
        <value>2</value>
    </property>
    <property>
        <name>dfs.namenode.http-address</name>
        <value> marklin.com:50070 </value>
    </property>
    <property>
        <name>dfs.namenode.secondary.http-address</name>
        <value> marklin.com:50090 </value>
    </property>
    <property>
        <name>dfs.webhdfs.enabled</name>
        <value>true</value>
    </property>
    <property>
        <name>dfs.permissions</name>
        <value>false</value>
    </property>
</configuration>
At the same time in the file path: /usr/local/hadoop/repository/hdfs, create name and data directories: mkdir name and mkdir data
(3) Create a mapred-site.xml file, enter: cp mapred-site.xml.template mapred-site.xml
[root@marklin hadoop]# cp mapred-site.xml.template mapred-site.xml
Edit the mapred-site.xml file and configure:
<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
    <property>
        <name>mapred.job.tracker</name>
        <value> hdfs://marklin.com:8021/ </value>
    </property>
    <property>
        <name>mapreduce.jobhistory.address</name>
        <value> marklin.com:10020 </value>
    </property>
    <property>
        <name>mapreduce.jobhistory.webapp.address</name>
        <value> marklin.com:19888 </value>
    </property>
    <property>
        <name>mapreduce.reduce.java.opts</name>
        <value>-Xms2000m -Xmx4600m</value>
    </property>
    <property>
        <name>mapreduce.map.memory.mb</name>
        <value>5120</value>
    </property>
    <property>
        <name>mapreduce.reduce.input.buffer.percent</name>
        <value>0.5</value>
    </property>
    <property>
        <name>mapreduce.reduce.memory.mb</name>
        <value>2048</value>
    </property>
    <property>
        <name>mapred.tasktracker.reduce.tasks.maximum</name>
        <value>2</value>
    </property>
    <property>
        <name>mapred.system.dir</name>
        <value>/usr/local/hadoop/repository/mapreduce/system</value>
        <final>true</final>
    </property>
    <property>
        <name>mapred.local.dir</name>
        <value>/usr/local/hadoop/repository/mapreduce/local</value>
        <final>true</final>
    </property>
</configuration>
 
(4) Modify yarn-site.xml and enter: vim yarn-site.xml
[root@marklin hadoop]# vim yarn-site.xml
[root@marklin hadoop]#
and configure:
<configuration>
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    <property>
        <name>yarn.nodemanager.auxservices.mapreduce.shuffle.class</name>
        <value>org.apache.hadoop.mapred.ShuffleHandler</value>
    </property>
    <property>
        <name>yarn.resourcemanager.hostname</name>
        <value> marklin.com </value>
    </property>
    <property>
        <name>yarn.resourcemanager.address</name>
        <value>${yarn.resourcemanager.hostname}:8032</value>
    </property>
    <property>
        <name>yarn.resourcemanager.scheduler.address</name>
        <value>${yarn.resourcemanager.hostname}:8030</value>
    </property>
    <property>
        <name>yarn.resourcemanager.resource-tracker.address</name>
        <value>${yarn.resourcemanager.hostname}:8031</value>
    </property>
    <property>
        <name>yarn.resourcemanager.admin.address</name>
        <value>${yarn.resourcemanager.hostname}:8033</value>
    </property>
    <property>
        <name>yarn.resourcemanager.webapp.address</name>
        <value>${yarn.resourcemanager.hostname}:8088</value>
    </property>
    <property>
        <name>yarn.nodemanager.resource.memory-mb</name>
        <value>1024</value>
    </property>
    <property>
        <name>yarn.app.mapreduce.am.staging-dir</name>
        <value>/usr/local/hadoop/repository/mapreduce/staging</value>
    </property>
    <property>
        <name>mapreduce.jobhistory.intermediate-done-dir</name>
        <value>${yarn.app.mapreduce.am.staging-dir}/history/done_intermediate</value>
    </property>
    <property>
        <name>mapreduce.jobhistory.done-dir</name>
        <value>${yarn.app.mapreduce.am.staging-dir}/history/done</value>
    </property>
</configuration>
 
 
[6] Under the Hadoop file directory [/usr/local/hadoop/hadoop-2.7.5/etc/hadoop],
The corresponding hadoop-env.sh, mapred-env.sh and yarn-env.sh file configuration JAVA_HOME: export JAVA_HOME=/usr/local/java/jdk1.8.0_162
Enter: vim hadoop-env.sh :
[root@marklin hadoop]# vim hadoop-env.sh
[root@marklin hadoop]#
export JAVA_HOME=/usr/local/java/jdk1.8.0_162
Enter: vim mapred-env.sh:
export JAVA_HOME=/usr/local/java/jdk1.8.0_162
[root@marklin hadoop]# vim mapred-env.sh
[root@marklin hadoop]#
Enter: vim yarn-env.sh
export JAVA_HOME=/usr/local/java/jdk1.8.0_162
[root@marklin hadoop]# vim yarn-env.sh
[root@marklin hadoop]#
 
【6】Open port: 50070
(1) Start the firewall: systemctl start firewalld.service
[root@marklin ~]# systemctl start firewalld.service
[root@marklin ~]#
(2) Start the firewall: firewall-cmd --zone=public --add-port=50070/tcp --permanent
[root@marklin ~]# firewall-cmd --zone=public --add-port=50070/tcp --permanent
[root@marklin ~]#
(3) Start: firewall-cmd --reload
[root@marklin ~]# firewall-cmd --reload
[root@marklin ~]# 
(4) Format: hdfs namenode -format
[root@marklin ~]# hdfs namenode -format
[root@marklin ~]#
(5) Startup script: start-all.sh
[root@marklin ~]# start-all.sh
[root@marklin ~]#
 
[root@marklin ~]# start-dfs.sh
Starting namenodes on [ marklin.com ]
marklin.com : starting namenode, logging to /usr/local/hadoop/hadoop-2.7.5/logs/ hadoop-root-namenode-marklin.com.out
marklin.com : starting datanode, logging to /usr/local/hadoop/hadoop-2.7.5/logs/ hadoop-root-datanode-marklin.com.out
Starting secondary namenodes [ marklin.com ]
marklin.com : starting secondarynamenode, logging to /usr/local/hadoop/hadoop-2.7.5/logs/ hadoop-root-secondarynamenode-marklin.com.out
 
 
[root@marklin ~]# start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /usr/local/hadoop/hadoop-2.7.5/logs/ yarn-root-resourcemanager-marklin.com.out
marklin.com : starting nodemanager, logging to /usr/local/hadoop/hadoop-2.7.5/logs/ yarn-root-nodemanager-marklin.com.out
 
 
[root@marklin ~]# jps
1122 QuorumPeerMain
6034 Jps
1043 QuorumPeerMain
5413 SecondaryNameNode
5580 ResourceManager
5085 NameNode
5709 NodeManager
5230 DataNode
1119 QuorumPeerMain
 
【7】Enter the test address:
[1] Browser input: http://192.168.3.4:50070/dfshealth.html#tab-overview
[2] Browser input: http://192.168.3.4:8088/cluster

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326170147&siteId=291194637