A preparatory
- Install the virtual machine to install Linux (slightly)
- Configure the network address (NAT) (omitted)
[root@hadoopNode1 ~]# vi /etc/sysconfig/network-scripts/ifcfg-ens33
NOTE: To determine the IP address, network segment
-
修改hostname配置(略)
[root@hadoopNode1 ~]# vi /etc/hostname
Note: The main computer name in English letters and can not modify the settings after a good
-
修改hosts映射配置()
For example: 192.168.138.100 HadoopNode1
-
Turn off the firewall (slightly)
shut down:
Disable boot:
[root@hadoopNode1 ~]#systemctl stop firewalld [root@hadoopNode1 ~]#systemctl disable firewalld
Re-starting the operating system to take effect
-
创建用户ambow,并创建密码ambow(略)
[root@hadoopNode1 ~]# useradd ambow
[root@hadoopNode1 ~]# passwd ambow
- Set ambow user with root privileges sudo
Using the root user, modify the / etc / sudoers file and find the line, adding a line below the root, as follows:
[ambow@ master soft]# vi /etc/sudoers
## Allow root to run any commands anywhere
root ALL=(ALL) ALL
ambow ALL=(ALL) ALL
Modification, can now be ambow account login, and then use the command su - ambow, you can get root privileges to operate.
-
Install the JDK
tar package:
[ambow@hadoopNode1 ~]$ pwd /home/ambow [ambow@hadoopNode1 ~]$ mkdir soft [ambow@hadoopNode1 ~]$ mkdir app [ambow@hadoopNode1 ~]$ ls app soft [ambow@hadoopNode1 ~]$ tree . . ├── app └── soft ├── hadoop-2.7.3.tar.gz ├── jdk-8u121-linux-x64.tar.gz └── zookeeper-3.4.6.tar.gz 2 directories, 3 files [ambow@hadoopNode1 ~]$ pwd /home/ambow [ambow@hadoopNode1 ~]$ tar -zxvf ./soft/jdk-8u121-linux-x64.tar.gz -C ./app/
Configuration JDK:
[ambow@hadoopNode1 jdk1.8.0_121]$ vi ~/.bash_profile
[ambow@hadoopNode1 jdk1.8.0_121]$ cat ~/.bash_profile
# .bash_profile
# Get the aliases and functions
if [ -f ~/.bashrc ]; then
. ~/.bashrc
fi
# User specific environment and startup programs
JAVA_HOME=/home/ambow/app/jdk1.8.0_121
PATH=$PATH:$HOME/.local/bin:$HOME/bin:$JAVA_HOME/bin
export PATH
export JAVA_HOME
[ambow@hadoopNode1 jdk1.8.0_121]$
[ambow@hadoopNode1 jdk1.8.0_121]$ source ~/.bash_profile
source ~ / .bash_profile make the configuration files to take effect
-
Restart the OS
reboot
Hadoop installation of three models1. Local Mode for development and debugging of formula
2. pseudo-distributed simulation of a small cluster
of a host multi-host simulationDataNode ResouceManger start the NameNode, nodemanager
3. Cluster mode :( production environment)Multiple hosts, respectively act as NaameNode, DataNode. . . .
Native Hadoop installation mode:
Hadoop Native Mode Installation
-
Decompression software Hadoop
[ambow@hadoopNode1 sbin]$ tar -zxvf ~/soft/hadoop-2.7.3.tar.gz -C ~/app/
-
Hadoop configuration environment variable
```shell
[ambow@hadoopNode1 hadoop-2.7.3]$ vi ~/.bash_profile
[ambow@hadoopNode1 hadoop-2.7.3]$ cat ~/.bash_profile
# .bash_profile
# Get the aliases and functions
if [ -f ~/.bashrc ]; then
. ~/.bashrc
fi
# User specific environment and startup programs
JAVA_HOME=/home/ambow/app/jdk1.8.0_121
HADOOP_HOME=/home/ambow/app/hadoop-2.7.3
PATH=$PATH:$HOME/.local/bin:$HOME/bin:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
export PATH
export JAVA_HOME
export HADOOP_HOME
```
-
Environment variables to take effect
[ambow@hadoopNode1 hadoop-2.7.3]$ source ~/.bash_profile
-
test
New test data files: ~ / data / mydata.txt
Test syntax:
hadoop jar $ HADOOP_HOME / share / hadoop / mapreduce / hadoop-mapreduce-examples-2.7.3.jar class name directory O
[ambow@hadoopNode1 mydata.out]$ hadoop jar ~/app/hadoop-2.7.3/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar wordcount ~/data/mydata.txt ~/data/mydata.out2
Pseudo distribution pattern configuration:
-
JDK installation
-
Hadoop installation
-
配置Hadoop的$HADOOP_HOME/etc/hadoop/core-siter.xml
fs.defaultFS
hadoop.tmp.dir[ambow@hadoopNode1 hadoop]$ vi $HADOOP_HOME/etc/hadoop/core-site.xml
<configuration> <!-- 配置默认FS hadoop3.X 默认端口为9820 hadoop2.X 默认端口为8020 hadoop1.X 默认端口为9000 一般伪分布设置为localhost:8020 --> <property> <name>fs.defaultFS</name> <value>hdfs://localhost:8020</value> </property> <!-- 指定hadoop运行时产生文件存储的目录 会自动创建 不建议默认 --> <property> <name>hadoop.tmp.dir</name> <value>/home/ambow/hdfs/data</value> </property> </configuration>
-
Configuring hdfs-siter.xml
setting the number of block copy dfs.replication: pseudo-distribution pattern can only be set to a default 3
[ambow@hadoopNode1 hadoop]$ vi $HADOOP_HOME/etc/hadoop/hdfs-site.xml
<configuration> <property> <!-- 配置每个block的副本个数 默认3个 当是单节点时配置为1 不能配置态多,态多反而降低效果 --> <name>dfs.replication</name> <!-- 伪分布式只能配1个副本 --> <value>1</value> </property> </configuration>
-
format
[ambow@hadoopNode1 ~]$ hadoop namenode -format
General format only once, if the format again, it is recommended each dataNode data should be deleted node, to prevent inconsistencies ID number DataNode and NameNode cluster can not start
[Picture dump outside the chain fails, the source station may have a security chain mechanism, it is recommended to save the pictures uploaded directly down (img-pto6Dc2v-1575559953078) (. \ Hadoop_imag \ 1565854449952.png)]
6. Start
hadoop-daemon.sh Start the NameNode
hadoop-daemon.sh Start Datanode
hadoop-daemon.sh stop namenode
hadoop-daemon.sh stop datanode
7. Review process
jps
- logs log file
~ / soft / hadop / logs
9: WEB access to view
http://192.168.100.100:50070/
-
To configure MR run two profiles on YARN
Placed mapred-siter.xml
[ambow@hadoopNode1 hadoop]$ vi $HADOOP_HOME/etc/hadoop/mapred-site.xml
<configuration> <property> <!-- 指定MapReduce使用Yarn资源管理框架 --> <name>mapreduce.framework.name</name> <value>yarn</value> </property> </configuration>
-
Configuring yarn-siter.xml
yarn.resourcemanger.hostname
yarn.nodemaager.aux-service
<configuration>
<property>
<!-- 指定yaran主要管理一个机节点 主机名 -->
<name>yarn.resourcemanager.hostname</name>
<value>hadoopNode1</value>
</property>
<property>
<!-- 使用mapreduce_shuffle服务 -->
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
- Start yarn
[ambow@hadoopNode1 data]$ yarn-daemon.sh start resourcemanager
[ambow@hadoopNode1 data]$ yarn-daemon.sh start nodemanager
- MR test operation
Upload Linux system ~ / data / mydata.txt file to the HDFS file system / user / ambow directory to
[ambow@hadoopNode1 data]$ hadoop dfs -put ~/data/mydata.txt /user/ambow
Hdfs files using yan wordcount to operate:
hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar wordcount /user/ambow/mydata.txt /user/ambow/output/wc/
Distributed Cluster Installation
1. Repair / etc / hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.100.200 hadoopNode1
192.168.100.201 hadoopNode2
192.168.100.202 hadoopNode3
192.168.100.203 hadoopNode4
2. Pseudo distribution mode VM to clone two
[Image dump the chain fails, the source station may have security chain mechanism, it is recommended to save the picture down uploaded directly (img-zh4Yl3nz-1575560034539) (C: \ Users \ LDG \ AppData \ Local \ Temp \ 1565926490995.png)]
3. Configure each virtual machine node: IP address, hostname, map files
[root@hadoopNode2 ~]# vi /etc/hostname
[root@hadoopNode2 ~]# vi /etc/sysconfig/network-scripts/ifcfg-ens33
[root@hadoopNode2 ~]# vi /etc/hosts
4. Verify the configuration
[root@hadoopNode2 ~]# ping hadoopNode1
PING hadoopNode1 (192.168.100.200) 56(84) bytes of data.
64 bytes from hadoopNode1 (192.168.100.200): icmp_seq=1 ttl=64 time=0.190 ms
64 bytes from hadoopNode1 (192.168.100.200): icmp_seq=2 ttl=64 time=0.230 ms
64 bytes from hadoopNode1 (192.168.100.200): icmp_seq=3 ttl=64 time=0.263 ms
64 bytes from hadoopNode1 (192.168.100.200): icmp_seq=4 ttl=64 time=0.227 ms
^C64 bytes from hadoopNode1 (192.168.100.200): icmp_seq=5 ttl=64 time=0.195 ms
64 bytes from hadoopNode1 (192.168.100.200): icmp_seq=6 ttl=64 time=0.268 ms
^C
--- hadoopNode1 ping statistics ---
6 packets transmitted, 6 received, 0% packet loss, time 5000ms
rtt min/avg/max/mdev = 0.190/0.228/0.268/0.035 ms
[root@hadoopNode2 ~]# ping hadoopNode2
PING hadoopNode2 (192.168.100.201) 56(84) bytes of data.
64 bytes from hadoopNode2 (192.168.100.201): icmp_seq=1 ttl=64 time=0.011 ms
64 bytes from hadoopNode2 (192.168.100.201): icmp_seq=2 ttl=64 time=0.022 ms
^C
--- hadoopNode2 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 999ms
rtt min/avg/max/mdev = 0.011/0.016/0.022/0.006 ms
[root@hadoopNode2 ~]# ping hadoopNode3
PING hadoopNode3 (192.168.100.202) 56(84) bytes of data.
64 bytes from hadoopNode3 (192.168.100.202): icmp_seq=1 ttl=64 time=0.246 ms
64 bytes from hadoopNode3 (192.168.100.202): icmp_seq=2 ttl=64 time=0.218 ms
64 bytes from hadoopNode3 (192.168.100.202): icmp_seq=3 ttl=64 time=0.218 ms
^C64 bytes from hadoopNode3 (192.168.100.202): icmp_seq=4 ttl=64 time=0.227 ms
^C
--- hadoopNode3 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3001ms
rtt min/avg/max/mdev = 0.218/0.227/0.246/0.015 ms
Master-slave master-slave architecture
- In the master node set free motivated secret landing
1). Maste node generates public and private keys
ssh-keygen -t rsa
2) distribution
ssh-copy-id localhost
ssh-copy-id hadoopNOde1
ssh-copy-id hadoopNOde2
ssh-copy-id hadoopNOde3
3) Verify landed each node on the Master node to see if a password is required
ssh hadoopNode2
ssh hadoopNode3
6. Configure core file core-siter.xml
[ambow@hadoopNode1 hadoop]$ vi core-site.xml
<configuration>
<!-- 配置默认FS hadoop3.X 默认端口为9820 hadoop2.X 默认端口为8020 hadoop1.X 默认端口为9000 一般伪分布设置为localhost:8020 -->
<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoopNode1:8020</value>
</property>
<!-- 指定hadoop运行时产生文件存储的目录 会自动创建 不建议默认 -->
<property>
<name>hadoop.tmp.dir</name>
<value>/home/ambow/hdfs/data</value>
</property>
</configuration>
- hdfs-site.xml
<configuration>
<property>
<!-- 配置每个block的副本个数 默认3个 当是单节点时配置为1 不能配置态多,态多反而降低效果 -->
<name>dfs.replication</name>
<!-- 伪分布式只能配1个副本 -->
<value>3</value>
</property>
<property>
<!-- 设置第辅助主节点 2NN -->
<name>dfs.namenode.secondary.http-address</name>
<value>hadoopNode2:50090</value>
</property>
<property>
<!-- 检查点的路径 -->
<name>dfs.namenode.checkpoint.dir</name>
<value>/home/ambow/hdfs/namesecondary</value>
</property>
</configuration>
8.mapred-site.xml
<configuration>
<property>
<!-- 指定MapReduce使用Yarn资源管理框架 -->
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
- yarn-site.xml
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<!-- 指定yaran主要管理一个机节点 -->
<name>yarn.resourcemanager.hostname</name>
<value>hadoopNode1</value>
</property>
<property>
<!-- 使用mapreduce_shuffle服务 -->
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
10. Modify slaves file to specify the current cluster nodes are those nodes DataNode to add the host name of the node to the slaves file
[ambow@hadoopNode1 hadoop]$ vi $HADOOP_HOME/etc/hadoop/slaves
hadoopNode1
hadoopNode2
hadoopNode3
11. distribute files to other nodes
Note: To stop all services before distributing
网络复制:语法: scp -r 源文件目录 用户名@主机名:目标路径
-r 递归复制
[ambow@hadoopNode1 hadoop]$ scp -r $HADOOP_HOME/etc/hadoop ambow@hadoopNode2:$HADOOP_HOME/etc/
[ambow@hadoopNode1 hadoop]$ scp -r $HADOOP_HOME/etc/hadoop ambow@hadoopNode3:$HADOOP_HOME/etc/
Note: Be sure to stop before the end of the distribution service
to distribute after use to format
12. Test
start-all.sh (start)
stop-all.sh (stop)