查看当前镜像
spc@spc-virtual-machine:~$ sudo docker images //这是镜像
REPOSITORY TAG IMAGE ID CREATED SIZE
hello-world latest e38bc07ac18e 5 days ago 1.85kB
spc@spc-virtual-machine:~$ sudo docker ps -a //这是容器
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
95f86eafda4e hello-world "/hello" 2 minutes ago Exited (0) About a minute ago epic_cori
spc@spc-virtual-machine:~$
在docker上安装ubuntu
spc@spc-virtual-machine:~$ sudo docker pull ubuntu
Using default tag: latest
latest: Pulling from library/ubuntu
d3938036b19c: Pull complete
a9b30c108bda: Pull complete
67de21feec18: Pull complete
817da545be2b: Pull complete
d967c497ce23: Pull complete
Digest: sha256
Status: Downloaded newer image for ubuntu:latest
查看
sudo docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
ubuntu latest c9d990395902 4 days ago 113MB
hello-world latest e38bc07ac18e 5 days ago 1.85kB
创建共享文件夹
用于本地与docker中的Ubuntu系统传输数据
本地路径
/home/spc/build
进入docker中的ubuntu
spc@spc-virtual-machine:~/build$ sudo docker run -it -v /home/spc/build:/root/build --name ubuntu ubuntu
root@122a0d05b89a:/# ll
更新docker中的ubuntu安装需要的软件
apt-get update 更新
apt-get install vim 安装vim
apt-get install ssh 安装ssh以便分布式hadoop连接
把这启动命令写进~/.bashrc文件,这样每次登录Ubuntu系统时,都能自动启动sshd服务;
vim ~/.bashrc
在最后一行添加
/etc/init.d/ssh start
配置ssh无密码连接本地ssh
ssh-keygen -t rsa
root@122a0d05b89a:~/.ssh# ll
total 20
drwx------ 2 root root 4096 Apr 17 09:10 ./
drwx------ 1 root root 4096 Apr 17 09:08 ../
-rw-r--r-- 1 root root 0 Apr 17 09:10 authorized_keys
-rw------- 1 root root 1679 Apr 17 09:11 id_rsa
-rw-r--r-- 1 root root 399 Apr 17 09:11 id_rsa.pub
root@122a0d05b89a:~/.ssh# cat id_rsa.pub >> authorized_keys
root@122a0d05b89a:~/.ssh#
注意:cat id_rsa.pub >> authorized_keys 有的地方写的的dsa,不知道写rsa对不对
dsa与rsa的区别
之后使用dsa
ssh-keygen -t dsa
root@122a0d05b89a:~/.ssh# ll
total 32
drwx------ 2 root root 4096 Apr 17 09:14 ./
drwx------ 1 root root 4096 Apr 17 09:08 ../
-rw-r--r-- 1 root root 399 Apr 17 09:11 authorized_keys
-rw------- 1 root root 668 Apr 17 09:14 id_dsa
-rw-r--r-- 1 root root 607 Apr 17 09:14 id_dsa.pub
-rw------- 1 root root 1679 Apr 17 09:11 id_rsa
-rw-r--r-- 1 root root 399 Apr 17 09:11 id_rsa.pub
root@122a0d05b89a:~/.ssh# cat id_dsa.pub >> authorized_keys
root@122a0d05b89a:~/.ssh#
安装JDK
apt-get install default-jdk
安装后的路径
/usr/lib/jvm/java-1.8.0-openjdk-amd64
/usr/lib/jvm/java-8-openjdk-amd64
添加环境变量
export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-amd64
export PATH=$PATH:$JAVA_HOME/bin
source ~/.bashrc
将当前容器保存为镜像
启动保存的镜像
$ sudo docker run -it -v /home/spc/build:/root/build --name hadooptext first
* Starting OpenBSD Secure Shell server sshd [ OK ]
root@d83136b16ff1:/#
查看启动状态
$ sudo docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
d83136b16ff1 first "/bin/bash" 56 seconds ago Up 54 seconds hadooptext
下载hadoop准备安装
将下载好的hadoop tar.gz文件放到共享文件夹
在启动的容器中安装hadoop
进入/root/build
解压tar -zxvf haddop-…… -C /usr/local/
查看版本检查安装
root@d83136b16ff1:/usr/local/hadoop-2.7.5# ./bin/hadoop version
Hadoop 2.7.5
Subversion https://[email protected]/repos/asf/hadoop.git -r 18065c2b6806ed4aa6a3187d77cbe21bb3dba075
Compiled by kshvachk on 2017-12-16T01:06Z
Compiled with protoc 2.5.0
From source with checksum 9f118f95f47043332d51891e37f736e9
This command was run using /usr/local/hadoop-2.7.5/share/hadoop/common/hadoop-common-2.7.5.jar
root@d83136b16ff1:/usr/local/hadoop-2.7.5#
修改配置
在hadoop-env.sh中修改
# The java implementation to use.
export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-amd64/ 这里换成自己的java路径
在core-site.xml中修改
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/usr/local/hadoop/tmp</value>
<description>Abase for other temporary directories.</description>
</property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://master:9000</value>
</property>
</configuration>
修改hdfs-site.xml
<configuration>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/hadoop/namenode_dir</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/usr/local/hadoop/datanode_dir</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
</configuration>
修改mapred-site.xml(复制mapred-site.xml.template,再修改文件名)
# cp mapred-site.xml.template mapred-site.xml
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
修改yarn-site.xml
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>master</value>
</property>
</configuration>
保存配置好的容器
$ sudo docker commit d83136b16ff1 hadoop
当前镜像与容器状况
$ sudo docker images -a
REPOSITORY TAG IMAGE ID CREATED SIZE
hadoop latest 9dad1900a237 42 seconds ago 925MB
first latest 05633ba1567b 2 hours ago 576MB
ubuntu latest c9d990395902 5 days ago 113MB
$ sudo docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
d83136b16ff1 first "/bin/bash" About an hour ago Up About an hour hadooptext
利用配置好的镜像启动创建Hadoop集群
首先启动三个终端
$ sudo docker run -it -h master --name master hadoop
[sudo] password for spc:
* Starting OpenBSD Secure Shell server sshd [ OK ]
root@master:/#
$ sudo docker run -it -h slave01 --name slave01 hadoop
[sudo] password for spc:
* Starting OpenBSD Secure Shell server sshd [ OK ]
root@slave01:/#
$ sudo docker run -it -h slave02 --name slave02 hadoop
[sudo] password for spc:
* Starting OpenBSD Secure Shell server sshd [ OK ]
root@slave02:/#
修改/etc/hosts
root@master:/# vim /etc/hosts
127.0.0.1 localhost
::1 localhost ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
172.17.0.2 master
172.17.0.3 slave01
172.17.0.4 slave02
使用ssh检查master是否可以连接slave01和slave02
root@master:/# ssh slave01
The authenticity of host 'slave01 (172.17.0.3)' can't be established.
ECDSA key fingerprint is SHA256:
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'slave01,172.17.0.3' (ECDSA) to the list of known hosts.
Welcome to Ubuntu 16.04.4 LTS (GNU/Linux 4.13.0-36-generic x86_64)
* Documentation: https://help.ubuntu.com
* Management: https://landscape.canonical.com
* Support: https://ubuntu.com/advantage
The programs included with the Ubuntu system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.
Ubuntu comes with ABSOLUTELY NO WARRANTY, to the extent permitted by
applicable law.
* Starting OpenBSD Secure Shell server sshd [ OK ]
root@slave01:~#
修改master的slaves文件
进入/usr/local/hadoop-2.7.5/etc/hadoop
root@master:/usr/local/hadoop-2.7.5/etc/hadoop# vim slaves
root@master:/usr/local/hadoop-2.7.5/etc/hadoop# cat slaves
slave01
slave02
启动集群
root@master:/usr/local/hadoop-2.7.5# cd bin/
root@master:/usr/local/hadoop-2.7.5/bin# ll
total 336
drwxr-xr-x 2 20415 systemd-journal 4096 Dec 16 01:12 ./
drwxr-xr-x 1 20415 systemd-journal 4096 Dec 16 01:12 ../
-rwxr-xr-x 1 20415 systemd-journal 111795 Dec 16 01:12 container-executor*
-rwxr-xr-x 1 20415 systemd-journal 6488 Dec 16 01:12 hadoop*
-rwxr-xr-x 1 20415 systemd-journal 8514 Dec 16 01:12 hadoop.cmd*
-rwxr-xr-x 1 20415 systemd-journal 12223 Dec 16 01:12 hdfs*
-rwxr-xr-x 1 20415 systemd-journal 7238 Dec 16 01:12 hdfs.cmd*
-rwxr-xr-x 1 20415 systemd-journal 5953 Dec 16 01:12 mapred*
-rwxr-xr-x 1 20415 systemd-journal 6094 Dec 16 01:12 mapred.cmd*
-rwxr-xr-x 1 20415 systemd-journal 1776 Dec 16 01:12 rcc*
-rwxr-xr-x 1 20415 systemd-journal 125103 Dec 16 01:12 test-container-executor*
-rwxr-xr-x 1 20415 systemd-journal 13352 Dec 16 01:12 yarn*
-rwxr-xr-x 1 20415 systemd-journal 11054 Dec 16 01:12 yarn.cmd*
root@master:/usr/local/hadoop-2.7.5/bin# hdfs namenode -format
-bash: hdfs: command not found
root@master:/usr/local/hadoop-2.7.5/bin# cd ..
root@master:/usr/local/hadoop-2.7.5# bin/hdfs namenode -format
root@master:/usr/local/hadoop-2.7.5/sbin# ./start-all.sh
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Starting namenodes on [master]
The authenticity of host 'master (172.17.0.2)' can't be established.
ECDSA key fingerprint is SHA256:4Kcq9xSIgyvYhnCk7mptUYNRY1qJlguKDWlA2TtGw6Y.
Are you sure you want to continue connecting (yes/no)? yes
master: Warning: Permanently added 'master,172.17.0.2' (ECDSA) to the list of known hosts.
master: starting namenode, logging to /usr/local/hadoop-2.7.5/logs/hadoop-root-namenode-master.out
slave01: starting datanode, logging to /usr/local/hadoop-2.7.5/logs/hadoop-root-datanode-slave01.out
slave02: starting datanode, logging to /usr/local/hadoop-2.7.5/logs/hadoop-root-datanode-slave02.out
Starting secondary namenodes [0.0.0.0]
The authenticity of host '0.0.0.0 (0.0.0.0)' can't be established.
ECDSA key fingerprint is SHA256:4Kcq9xSIgyvYhnCk7mptUYNRY1qJlguKDWlA2TtGw6Y.
Are you sure you want to continue connecting (yes/no)? yes
0.0.0.0: Warning: Permanently added '0.0.0.0' (ECDSA) to the list of known hosts.
0.0.0.0: starting secondarynamenode, logging to /usr/local/hadoop-2.7.5/logs/hadoop-root-secondarynamenode-master.out
starting yarn daemons
starting resourcemanager, logging to /usr/local/hadoop-2.7.5/logs/yarn-root-resourcemanager-master.out
slave02: starting nodemanager, logging to /usr/local/hadoop-2.7.5/logs/yarn-root-nodemanager-slave02.out
slave01: starting nodemanager, logging to /usr/local/hadoop-2.7.5/logs/yarn-root-nodemanager-slave01.out
在master和slave01和slave02上使用jps查看结果
root@master:/usr/local/hadoop-2.7.5/sbin# jps
852 Jps
261 NameNode
445 SecondaryNameNode
589 ResourceManager
root@master:/usr/local/hadoop-2.7.5/sbin#
root@slave01:/# jps
91 DataNode
270 Jps
191 NodeManager
root@slave01:/#
root@slave02:/# jps
245 Jps
91 DataNode
191 NodeManager
出现问题
root@master:/usr/local/hadoop-2.7.5/bin# dfs ls /user/hadoop/input
-bash: dfs: command not found
修改~/.bashrc
export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-amd64/
export HADOOP_HOME=/usr/local/hadoop-2.7.5/bin
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME
然后source ~/.bashrc
之后可以使用了
root@master:/usr/local/hadoop-2.7.5/bin# hdfs dfs
Usage: hadoop fs [generic options]
[-appendToFile <localsrc> ... <dst>]
[-cat [-ignoreCrc] <src> ...]
[-checksum <src> ...]
[-chgrp [-R] GROUP PATH...]
[-chmod [-R] <MODE[,MODE]... | OCTALMODE> PATH...]
[-chown [-R] [OWNER][:[GROUP]] PATH...]
[-copyFromLocal [-f] [-p] [-l] <localsrc> ... <dst>]
[-copyToLocal [-p] [-ignoreCrc] [-crc] <src> ... <localdst>]
[-count [-q] [-h] <path> ...]
[-cp [-f] [-p | -p[topax]] <src> ... <dst>]
[-createSnapshot <snapshotDir> [<snapshotName>]]
[-deleteSnapshot <snapshotDir> <snapshotName>]
[-df [-h] [<path> ...]]
[-du [-s] [-h] <path> ...]
[-expunge]
[-find <path> ... <expression> ...]
[-get [-p] [-ignoreCrc] [-crc] <src> ... <localdst>]
[-getfacl [-R] <path>]
[-getfattr [-R] {-n name | -d} [-e en] <path>]
[-getmerge [-nl] <src> <localdst>]
[-help [cmd ...]]
[-ls [-d] [-h] [-R] [<path> ...]]
[-mkdir [-p] <path> ...]
[-moveFromLocal <localsrc> ... <dst>]
[-moveToLocal <src> <localdst>]
[-mv <src> ... <dst>]
[-put [-f] [-p] [-l] <localsrc> ... <dst>]
[-renameSnapshot <snapshotDir> <oldName> <newName>]
[-rm [-f] [-r|-R] [-skipTrash] <src> ...]
[-rmdir [--ignore-fail-on-non-empty] <dir> ...]
[-setfacl [-R] [{-b|-k} {-m|-x <acl_spec>} <path>]|[--set <acl_spec> <path>]]
[-setfattr {-n name [-v value] | -x name} <path>]
[-setrep [-R] [-w] <rep> <path> ...]
[-stat [format] <path> ...]
[-tail [-f] <file>]
[-test -[defsz] <path>]
[-text [-ignoreCrc] <src> ...]
[-touchz <path> ...]
[-truncate [-w] <length> <path> ...]
[-usage [cmd ...]]
Generic options supported are
-conf <configuration file> specify an application configuration file
-D <property=value> use value for given property
-fs <local|namenode:port> specify a namenode
-jt <local|resourcemanager:port> specify a ResourceManager
-files <comma separated list of files> specify comma separated files to be copied to the map reduce cluster
-libjars <comma separated list of jars> specify comma separated jar files to include in the classpath.
-archives <comma separated list of archives> specify comma separated archives to be unarchived on the compute machines.
The general command line syntax is
bin/hadoop command [genericOptions] [commandOptions]
root@master:/usr/local/hadoop-2.7.5/bin#
命令可以正常使用
root@master:/usr/local/hadoop-2.7.5/bin# hdfs dfs -put ../etc/hadoop/*.xml /user/hadoop/input
put: `/user/hadoop/input/capacity-scheduler.xml': File exists
put: `/user/hadoop/input/core-site.xml': File exists
put: `/user/hadoop/input/hadoop-policy.xml': File exists
put: `/user/hadoop/input/hdfs-site.xml': File exists
put: `/user/hadoop/input/httpfs-site.xml': File exists
put: `/user/hadoop/input/kms-acls.xml': File exists
put: `/user/hadoop/input/kms-site.xml': File exists
put: `/user/hadoop/input/mapred-site.xml': File exists
put: `/user/hadoop/input/yarn-site.xml': File exists
root@master:/usr/local/hadoop-2.7.5/bin#
关闭Hadoop集群
root@master:/usr/local/hadoop-2.7.5/sbin# ./stop-all.sh
This script is Deprecated. Instead use stop-dfs.sh and stop-yarn.sh
Stopping namenodes on [master]
master: stopping namenode
slave02: stopping datanode
slave01: stopping datanode
Stopping secondary namenodes [0.0.0.0]
0.0.0.0: stopping secondarynamenode
stopping yarn daemons
stopping resourcemanager
slave01: stopping nodemanager
slave02: stopping nodemanager
slave01: nodemanager did not stop gracefully after 5 seconds: killing with kill -9
slave02: nodemanager did not stop gracefully after 5 seconds: killing with kill -9
no proxyserver to stop