JDK installation
Set hostname
[root@bigdata111 ~]# vi /etc/hostname
Set machine hosts
[root@bigdata111 ~]# vi /etc/hosts
192.168.1.111 bigdata111
192.168.1.112 bigdata112
192.168.1.113 bigdata113
Create a directory jdk
[root@bigdata111 /]# cd /opt
[root@bigdata111 opt]# ll
总用量 0
drwxr-xr-x. 2 root root 6 3月 26 2015 rh
[root@bigdata111 opt]# mkdir module
[root@bigdata111 opt]# mkdir soft
[root@bigdata111 opt]# ls
module rh soft
Upload jdk package
Open winSCP tool, upload java jdk winscp by linux tool to the / opt / soft folder
[root@bigdata111 opt]# cd soft
[root@bigdata111 soft]# ls
jdk-8u144-linux-x64.tar.gz
Decompression jdk
Unzip the file to the module jdk folder, the command is as follows:
[root@bigdata111 opt]# cd soft
[root@bigdata111 opt]# tar -zxvf jdk-8u144-linux-x64.tar.gz -C /opt/module/
[root@bigdata111 soft]# cd /opt/module
[root@bigdata111 module]# ls
jdk1.8.0_144
Set environment variables jdk
[root@bigdata111 module]# vi /etc/profile
Add jdk environment variable at the end of the file, save and exit:
export JAVA_HOME=/opt/module/jdk1.8.0_144
export PATH=$PATH:$JAVA_HOME/bin
Refresh environment variable
[root@bigdata111 module]# source /etc/profile
Check whether the installation is successful jdk
[root@bigdata111 module]# java -version
java version "1.8.0_144"
Java(TM) SE Runtime Environment (build 1.8.0_144-b01)
Java HotSpot(TM) 64-Bit Server VM (build 25.144-b01, mixed mode)
Hadoop to build local mode
Local mode is a stand-alone installation hadoop.
Installation hadoop
Upload hadoop package
Hadoop package to the next upload by winSCP / opt / soft / folder
[root@bigdata111 soft]# ls
hadoop-2.8.4.tar.gz jdk-8u144-linux-x64.tar.gz
Decompression hadoop
Hadoop to extract the / opt / module / lower
[root@bigdata111 module]# tar -zvxf hadoop-2.8.4.tar.gz -C /opt/module/
[root@bigdata111 soft]# cd /opt/module/
[root@bigdata111 module]# ls
hadoop-2.8.4 jdk1.8.0_144
Hadoop environment variable settings
[root@bigdata111 module]# vi /etc/profile
Add the following configuration at the end, save and exit:
export HADOOP_HOME=/opt/module/hadoop-2.8.4
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
Refresh Profile
[root@bigdata111 module]# source /etc/profile
Check whether the installation is successful hadoop
[root@bigdata111 module]# hadoop
Usage: hadoop [--config confdir] [COMMAND | CLASSNAME]
CLASSNAME run the class named CLASSNAME
or
where COMMAND is one of:
fs run a generic filesystem user client
version print the version
jar <jar> run a jar file
note: please use "yarn jar" to launch
YARN applications, not this command.
checknative [-a|-h] check native hadoop and compression libraries availability
distcp <srcurl> <desturl> copy file or directories recursively
archive -archiveName NAME -p <parent path> <src>* <dest> create a hadoop archive
classpath prints the class path needed to get the
Hadoop jar and the required libraries
credential interact with credential providers
daemonlog get/set the log level for each daemon
trace view and modify Hadoop tracing settings
Most commands print help when invoked w/o parameters.
Examples of test hadoop
Creating test file
In the new module directory testdoc file, enter the text:
[root@bigdata111 module]# cd /opt/module
[root@bigdata111 module]# touch testdoc
[root@bigdata111 module]# vi testdoc
[root@bigdata111 module]# cat testdoc
this is a test page!
chinese is the best country
this is a ceshi page!
i love china
listen to the music
and son on
Switching jar package directory
Switching execution to a jar hadoop directory:
[root@bigdata111 module]# cd /opt/module/hadoop-2.8.4/share/hadoop/mapreduce/
[root@bigdata111 mapreduce]# ls
hadoop-mapreduce-client-app-2.8.4.jar hadoop-mapreduce-client-core-2.8.4.jar hadoop-mapreduce-client-hs-plugins-2.8.4.jar hadoop-mapreduce-client-jobclient-2.8.4-tests.jar hadoop-mapreduce-examples-2.8.4.jar lib sources
hadoop-mapreduce-client-common-2.8.4.jar hadoop-mapreduce-client-hs-2.8.4.jar hadoop-mapreduce-client-jobclient-2.8.4.jar hadoop-mapreduce-client-shuffle-2.8.4.jar jdiff lib-examples
Wordcount program execution
[root@bigdata111 mapreduce]# hadoop jar hadoop-mapreduce-examples-2.8.4.jar wordcount /opt/module/testdoc /opt/module/out
[root@bigdata111 mapreduce]# ls /opt/module/out
part-r-00000 _SUCCESS
[root@bigdata111 mapreduce]# cat /opt/module/out/part-r-00000
a 2
and 1
best 1
ceshi 1
china 1
chinese 1
country 1
i 1
is 3
listen 1
love 1
music 1
on 1
page! 2
son 1
test 1
the 2
this 2
to 1
Build a pseudo-distributed Hadoop
Pseudo-distributed is distributed operation disposed on a single machine.
View hadoop executable file
[root@bigdata111 mapreduce]# cd /opt/module/hadoop-2.8.4/
[root@bigdata111 hadoop-2.8.4]# ls
bin etc include lib libexec LICENSE.txt NOTICE.txt README.txt sbin share
[root@bigdata111 hadoop-2.8.4]# cd bin
[root@bigdata111 bin]# ls
container-executor hadoop hadoop.cmd hdfs hdfs.cmd mapred mapred.cmd rcc test-container-executor yarn yarn.cmd
[root@bigdata111 bin]# cd ..
[root@bigdata111 hadoop-2.8.4]# cd sbin
[root@bigdata111 sbin]# ls
distribute-exclude.sh hadoop-daemons.sh hdfs-config.sh kms.sh refresh-namenodes.sh start-all.cmd start-balancer.sh start-dfs.sh start-yarn.cmd stop-all.cmd stop-balancer.sh stop-dfs.sh stop-yarn.cmd yarn-daemon.sh
hadoop-daemon.sh hdfs-config.cmd httpfs.sh mr-jobhistory-daemon.sh slaves.sh start-all.sh start-dfs.cmd start-secure-dns.sh start-yarn.sh stop-all.sh stop-dfs.cmd stop-secure-dns.sh stop-yarn.sh yarn-daemons.sh
Switching Profiles directory
Set /opt/module/hadoop-2.8.4/etc/hadoop/ into hadoop directory:
[root@bigdata111 hadoop]# cd /opt/module/hadoop-2.8.4/etc/hadoop/
[root@bigdata111 hadoop]# ls
capacity-scheduler.xml core-site.xml hadoop-metrics2.properties hdfs-site.xml httpfs-signature.secret kms-env.sh log4j.properties mapred-queues.xml.template ssl-client.xml.example yarn-env.sh
configuration.xsl hadoop-env.cmd hadoop-metrics.properties httpfs-env.sh httpfs-site.xml kms-log4j.properties mapred-env.cmd mapred-site.xml.template ssl-server.xml.example yarn-site.xml
container-executor.cfg hadoop-env.sh hadoop-policy.xml httpfs-log4j.properties kms-acls.xml kms-site.xml mapred-env.sh slaves yarn-env.cmd
Placed core-site.xml
[root@bigdata111 hadoop]# vi core-site.xml
<configuration>
<!-- 指定HDFS中NameNode的地址 -->
<property>
<name>fs.defaultFS</name>
<value>hdfs://bigdata111:9000</value>
</property>
<!-- 指定hadoop运行时产生文件的存储目录 -->
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/module/hadoop-2.8.4/data/tmp</value>
</property>
</configuration>
Configuring hdfs-site.xml
[root@bigdata111 hadoop]# vi hdfs-site.xml
<configuration>
<!--数据冗余数-->
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
Configuring yarn-site.xml
[root@bigdata111 hadoop]# vi yarn-site.xml
<configuration>
<!-- Site specific YARN configuration properties -->
<!-- reducer获取数据的方式 -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<!-- 指定YARN的ResourceManager的地址 -->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>bigdata111</value>
</property>
<!-- 日志聚集功能使能 -->
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<!-- 日志保留时间设置7天(秒) -->
<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>604800</value>
</property>
</configuration>
Configuring mapred-site.xml
Rename mapred-site.xml.template as mapred-site.xml, configuration content
[root@bigdata111 hadoop]# mv mapred-site.xml.template mapred-site.xml
[root@bigdata111 hadoop]# ls
capacity-scheduler.xml core-site.xml hadoop-metrics2.properties hdfs-site.xml httpfs-signature.secret kms-env.sh log4j.properties mapred-queues.xml.template ssl-client.xml.example yarn-env.sh
configuration.xsl hadoop-env.cmd hadoop-metrics.properties httpfs-env.sh httpfs-site.xml kms-log4j.properties mapred-env.cmd mapred-site.xml ssl-server.xml.example yarn-site.xml
container-executor.cfg hadoop-env.sh hadoop-policy.xml httpfs-log4j.properties kms-acls.xml kms-site.xml mapred-env.sh slaves yarn-env.cmd
[root@bigdata111 hadoop]# vi mapred-site.xml
<configuration>
<!-- 指定mr运行在yarn上-->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<!--历史服务器的地址-->
<property>
<name>mapreduce.jobhistory.address</name>
<value>bigdata111:10020</value>
</property>
<!--历史服务器页面的地址-->
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>bigdata111:19888</value>
</property>
</configuration>
Configuration hadoop-env.sh
Java_home modify an absolute path, save and exit:
[root@bigdata111 hadoop]# vi hadoop-env.sh
export JAVA_HOME=/opt/module/jdk1.8.0_144
Formatting namenode
Configured, formatted namenode (first file format only)
[root@bigdata111 hadoop]# hadoop namenode -format
Why do you want to format?
NameNode mainly used to manage the entire namespace distributed file system (actually directories and files) metadata information, and in order to ensure the reliability of the data, also joined the operation log, so, will NameNode persistent data ( saved to a local file system). For the first time HDFS, NameNode when you start, you need to run -format command before you can start the service node NameNode normal.
What format do something?
On NameNode nodes, there are two most important path, it is used to store metadata information and operation log, and the two paths from the configuration file, corresponding to their properties and are dfs.name.dir dfs.name .edits.dir, at the same time, they are the default path is / tmp / hadoop / dfs / name. When formatting, NameNode will clear all the files in two directories, then, creates a file in the directory dfs.name.dir
hadoop.tmp.dir this configuration, make dfs.name.dir and dfs.name.edits.dir will generate two files in a directory catalog
Open hdfs services and yarn
When namenode and resourcemanager in a machine, using the following command:
[root@bigdata111 hadoop]# start-all.sh
When the two is not a machine, using the following command:
[root@bigdata111 hadoop]# start-dfs.sh
[root@bigdata111 hadoop]# start-yarn.sh
Hdfs access web pages
Default Port: 50070
http://192.168.1.111:50070
Access yarn web page
Default Port: 8088
http://192.168.1.111:8088
Set up a Hadoop cluster
VMvare using clone mode, the machine 111 as a template to clone additional two machines.
Modify the host name and IP
Modifying the cloned two machines hostname and IP address so xshell connected:
[root@bigdata112 ~]# vi /etc/hostname
[root@bigdata112 ~]# vi /etc/sysconfig/network-scripts/ifcfg-eno16777736
[root@bigdata112 ~]# service network restart
[root@bigdata112 ~]# ip addr
TYPE=Ethernet
BOOTPROTO=static
DEFROUTE=yes
PEERDNS=yes
PEERROUTES=yes
IPV4_FAILURE_FATAL=no
IPV6INIT=yes
IPV6_AUTOCONF=yes
IPV6_DEFROUTE=yes
IPV6_PEERDNS=yes
IPV6_PEERROUTES=yes
IPV6_FAILURE_FATAL=no
NAME=eno16777736
UUID=24bbe130-f59a-4b25-9df6-cf5857c89699
DEVICE=eno16777736
ONBOOT=yes
IPADDR=192.168.1.112
GATEWAY=192.168.1.2
DNS1=8.8.8.8
Delete the data directory
Delete the data directory /opt/module/hadoop-2.8.4, the purpose distributed cluster configuration.
[root@bigdata111 hadoop-2.8.4]# cd /opt/module/hadoop-2.8.4/
[root@bigdata111 hadoop-2.8.4]# rm -rf data/
Configuring hosts
Configure hosts and host name of the correspondence between IP
[root@bigdata111 hadoop-2.8.4]# vi /etc/hosts
192.168.1.111 bigdata111
192.168.1.112 bigdata112
192.168.1.113 bigdata113
SCP sends other machines
Configured to send the first two hosts file to other machines:
[root@bigdata111 hadoop-2.8.4]# scp /etc/hosts root@bigdata112:/etc/
[root@bigdata111 hadoop-2.8.4]# scp /etc/hosts root@bigdata113:/etc/
Configuring SSH-free dense Login
- The use Xshell sent to all session key input function, to generate keys in three machines
[root@bigdata111 hadoop-2.8.4]# ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
cc:47:37:5a:93:0f:77:38:53:af:a3:57:47:55:27:59 root@bigdata111
The key's randomart image is:
+--[ RSA 2048]----+
| .oE|
| ..++|
| . B = +|
| o . + * * |
| S o + o|
| . . o.|
| . . |
| . |
| |
+-----------------+
- The use Xshell sent to all session key input function, the keys added to the cluster library respective keys of the machine
[root@bigdata111 hadoop-2.8.4]# ssh-copy-id bigdata111
[root@bigdata111 hadoop-2.8.4]# ssh-copy-id bigdata112
[root@bigdata111 hadoop-2.8.4]# ssh-copy-id bigdata113
- View keys library exists
[root@bigdata111 .ssh]# cd /root/.ssh
[root@bigdata111 .ssh]# ls
authorized_keys id_rsa id_rsa.pub known_hosts
[root@bigdata111 .ssh]# cat authorized_keys
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC7cSXZDdNJ0Cg+1wyVoCn4pWEAxy/13/ekg//YVkGwEsR6HO4XaYxxstVBij5JoTEEjSDNmz2HifTZDB098py3x882ZLVHJllJWzXYX4gVof/tmdmk5AJbhIlX3SoauTrrrzFiMtuXKdu6slvzhs9IbDp68xCUNiVI06OnWFSuhQc8Td+tekwlFPfm+v3W/PqUUgQAd+OAqOUC2vEjjnACQNw/wgGvF/lqrXDv5ZIFmYCBlB7YxwP9RykOvAzEe7w2W7TOt0K8V8oKKTui4aZuahWDbsGwlD7TAQRkilXkG59XG48AWOQoU/XFxph+XECqJzjmdxYedzY8inYW/Lfx root@bigdata111
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDYyMVfLaL9w9sGz5hQG96ksUN5ih2RHdwsiXBpL/ZRG7LasKS+OQcszmc61TJfV0Vjad7kuL9wlg2YqlVIQJvaIUQCw4+5BrO0vCy4JBrz/FiDjzxKx0Ba+ILziuMxl35RxDCVGph17i2jpYfy6jGLejYK9kpJH4ueIj8mm+4LTKabRZTcjdNNI0kYM+Tr08wEIuQ45adqVU9MpZc/j6i1FIr4R/RabyuO1FhEh0+Oc5Xbm3jSAYH0MgEvK1cuG9wmX7SaB/opO00Ts+nW/P4umeZQUy51IQSRdUF6BlMrshnCSlKHnuLv2eSCx9yv3QuQMWHnL/SOXUgTnIuzbrv9 root@bigdata112
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDoBOAT/n1QCnaVJtRS1Q9GeoP665gIayWxpSWbjEFus4DL4as5S9jAIhBQWrTnvZzm+Skb4dxGPgdPYLaMFX9tdDYPPsnnRR92sLpRw9gwvG5ROL5XPpV2X+Yxl6yACmlMT0JP1uk+Ekm623n6wtBSBP1BDtJ/fhXkRX6bo2kuXs4BvmP76cikdGBDygKNIEMPTcs6p2lfOnuVdQLSCGm+Q9NswKSBVElNyywNl5J9L/5kIzGXnoGtwhQtdrOjZ+c1tyiwhCz42I3c4z0Sb/zH3OFtHCvRG7cF72uDFxe1QwVJ4h1hJ1dmtwVCckNMbmmgK72PsN8Zg4Y8XtBXgX8n root@bigdata113
- Verify SSH password-free login successful configuration
[root@bigdata111 .ssh]# ssh bigdata112
Last login: Mon Aug 5 09:23:11 2019 from bigdata112
[root@bigdata112 ~]# ssh bigdata111
Last login: Mon Aug 5 09:09:23 2019 from 192.168.1.1
Jdk and deploy hadoop
- Remove the check " sent to all session key input " is transmitted from bigdata111 module folder to another two machines / opt / folder:
[root@bigdata111 module]# scp -r /opt/module/ root@bigdata112:/opt/
[root@bigdata111 module]# scp -r /opt/module/ root@bigdata113:/opt/
- The environment variable / etc / profile sent to the other two machines:
[root@bigdata111 module]# scp -r /etc/profile root@bigdata112:/etc/
[root@bigdata111 module]# scp -r /etc/profile root@bigdata113:/etc/
- Switch to the other two machines, refresh environment variables:
[root@bigdata112 module]# source /etc/profile
[root@bigdata112 module]# jps
2775 Jps
[root@bigdata113 module]# source /etc/profile
[root@bigdata113 module]# jps
2820 Jps
Configuring the cluster xml
Check "Send all session key input to the" configure hdfs-site, xml files yarn-site, mapred-site of:
- hdfs-site.xml configured as follows (SecondaryNameNode 113 disposed on):
<configuration>
<!--数据冗余数-->
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<!--secondary的地址-->
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>bigdata113:50090</value>
</property>
<!--关闭权限-->
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
</configuration>
- yarn-site.xml configured as follows (Yarn 112 disposed on):
<configuration>
<!-- Site specific YARN configuration properties -->
<!-- reducer获取数据的方式 -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<!-- 指定YARN的ResourceManager的地址 -->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>bigdata112</value>
</property>
<!-- 日志聚集功能使能 -->
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<!-- 日志保留时间设置7天(秒) -->
<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>604800</value>
</property>
</configuration>
- mapred-site.xml configured as follows:
<configuration>
<!-- 指定mr运行在yarn上-->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<!--历史服务器的地址-->
<property>
<name>mapreduce.jobhistory.address</name>
<value>bigdata112:10020</value>
</property>
<!--历史服务器页面的地址-->
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>bigdata112:19888</value>
</property>
</configuration>
Configuring datanode slaves of
[root@bigdata111 ~]# cd /opt/module/hadoop-2.8.4/etc/hadoop/
[root@bigdata111 hadoop]# ls
capacity-scheduler.xml core-site.xml hadoop-metrics2.properties hdfs-site.xml httpfs-signature.secret kms-env.sh log4j.properties mapred-queues.xml.template ssl-client.xml.example yarn-env.sh
configuration.xsl hadoop-env.cmd hadoop-metrics.properties httpfs-env.sh httpfs-site.xml kms-log4j.properties mapred-env.cmd mapred-site.xml ssl-server.xml.example yarn-site.xml
container-executor.cfg hadoop-env.sh hadoop-policy.xml httpfs-log4j.properties kms-acls.xml kms-site.xml mapred-env.sh slaves yarn-env.cmd
[root@bigdata111 hadoop]# vi slaves
bigdata111
bigdata112
bigdata113
Formatting namenode
Using xshell "send to all session key input" function to format namenode
[root@bigdata111 hadoop]# hadoop namenode -format
[root@bigdata112 hadoop]# hadoop namenode -format
[root@bigdata113 hadoop]# hadoop namenode -format
Start of hdfs 111
[root@bigdata111 hadoop]# start-dfs.sh
Start of yarn 112
[root@bigdata112 hadoop]# start-yarn.sh
Jps output process three machines
[root@bigdata111 hadoop]# jps
2512 DataNode
2758 NodeManager
2377 NameNode
2894 Jps
[root@bigdata112 ~]# jps
2528 NodeManager
2850 Jps
2294 DataNode
2413 ResourceManager
[root@bigdata113 ~]# jps
2465 NodeManager
2598 Jps
2296 DataNode
2398 SecondaryNameNode