docker搭建hadoop分布式集群

查看当前镜像

 spc@spc-virtual-machine:~$ sudo docker images    //这是镜像
REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
hello-world         latest              e38bc07ac18e        5 days ago          1.85kB
spc@spc-virtual-machine:~$ sudo docker ps -a     //这是容器
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS                          PORTS               NAMES
95f86eafda4e        hello-world         "/hello"            2 minutes ago       Exited (0) About a minute ago                       epic_cori
spc@spc-virtual-machine:~$ 

在docker上安装ubuntu

 spc@spc-virtual-machine:~$ sudo docker pull ubuntu
Using default tag: latest
latest: Pulling from library/ubuntu
d3938036b19c: Pull complete 
a9b30c108bda: Pull complete 
67de21feec18: Pull complete 
817da545be2b: Pull complete 
d967c497ce23: Pull complete 
Digest: sha256
Status: Downloaded newer image for ubuntu:latest

查看

 sudo docker images
REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
ubuntu              latest              c9d990395902        4 days ago          113MB
hello-world         latest              e38bc07ac18e        5 days ago          1.85kB

创建共享文件夹

 用于本地与docker中的Ubuntu系统传输数据
本地路径
    /home/spc/build

进入docker中的ubuntu

 spc@spc-virtual-machine:~/build$ sudo docker run -it -v /home/spc/build:/root/build --name ubuntu ubuntu
root@122a0d05b89a:/# ll

更新docker中的ubuntu安装需要的软件

 apt-get update 更新
apt-get install vim 安装vim
apt-get install ssh 安装ssh以便分布式hadoop连接
把这启动命令写进~/.bashrc文件,这样每次登录Ubuntu系统时,都能自动启动sshd服务;
    vim ~/.bashrc
    在最后一行添加
    /etc/init.d/ssh start
配置ssh无密码连接本地ssh
    ssh-keygen -t rsa
    root@122a0d05b89a:~/.ssh# ll
    total 20
    drwx------ 2 root root 4096 Apr 17 09:10 ./
    drwx------ 1 root root 4096 Apr 17 09:08 ../
    -rw-r--r-- 1 root root    0 Apr 17 09:10 authorized_keys
    -rw------- 1 root root 1679 Apr 17 09:11 id_rsa
    -rw-r--r-- 1 root root  399 Apr 17 09:11 id_rsa.pub
    root@122a0d05b89a:~/.ssh# cat id_rsa.pub >> authorized_keys
    root@122a0d05b89a:~/.ssh# 

    注意:cat id_rsa.pub >> authorized_keys 有的地方写的的dsa,不知道写rsa对不对
    dsa与rsa的区别

    之后使用dsa
    ssh-keygen -t dsa
    root@122a0d05b89a:~/.ssh# ll
    total 32
    drwx------ 2 root root 4096 Apr 17 09:14 ./
    drwx------ 1 root root 4096 Apr 17 09:08 ../
    -rw-r--r-- 1 root root  399 Apr 17 09:11 authorized_keys
    -rw------- 1 root root  668 Apr 17 09:14 id_dsa
    -rw-r--r-- 1 root root  607 Apr 17 09:14 id_dsa.pub
    -rw------- 1 root root 1679 Apr 17 09:11 id_rsa
    -rw-r--r-- 1 root root  399 Apr 17 09:11 id_rsa.pub
    root@122a0d05b89a:~/.ssh# cat id_dsa.pub >> authorized_keys
    root@122a0d05b89a:~/.ssh# 

安装JDK

 apt-get install default-jdk
    安装后的路径
        /usr/lib/jvm/java-1.8.0-openjdk-amd64
        /usr/lib/jvm/java-8-openjdk-amd64
添加环境变量
        export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-amd64
        export PATH=$PATH:$JAVA_HOME/bin
        source ~/.bashrc

将当前容器保存为镜像

启动保存的镜像

 $ sudo docker run -it -v /home/spc/build:/root/build --name hadooptext first
 * Starting OpenBSD Secure Shell server sshd                                                                                                [ OK ] 
root@d83136b16ff1:/# 

查看启动状态

 $ sudo docker  ps 
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES
d83136b16ff1        first               "/bin/bash"         56 seconds ago      Up 54 seconds                           hadooptext

下载hadoop准备安装

 将下载好的hadoop tar.gz文件放到共享文件夹

在启动的容器中安装hadoop

 进入/root/build
解压tar -zxvf haddop-…… -C /usr/local/

查看版本检查安装
    root@d83136b16ff1:/usr/local/hadoop-2.7.5# ./bin/hadoop version
    Hadoop 2.7.5
    Subversion https://[email protected]/repos/asf/hadoop.git -r 18065c2b6806ed4aa6a3187d77cbe21bb3dba075
    Compiled by kshvachk on 2017-12-16T01:06Z
    Compiled with protoc 2.5.0
    From source with checksum 9f118f95f47043332d51891e37f736e9
    This command was run using /usr/local/hadoop-2.7.5/share/hadoop/common/hadoop-common-2.7.5.jar
    root@d83136b16ff1:/usr/local/hadoop-2.7.5# 

修改配置

 在hadoop-env.sh中修改
    # The java implementation to use.
    export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-amd64/ 这里换成自己的java路径

 在core-site.xml中修改
    <configuration>
    <property>
    <name>hadoop.tmp.dir</name>
    <value>file:/usr/local/hadoop/tmp</value>
    <description>Abase for other temporary directories.</description>
    </property>
    <property>
    <name>fs.defaultFS</name>
    <value>hdfs://master:9000</value>
    </property>
    </configuration>
修改hdfs-site.xml
    <configuration>
        <property>
            <name>dfs.namenode.name.dir</name>
            <value>file:/usr/local/hadoop/namenode_dir</value>
        </property>
        <property>
            <name>dfs.datanode.data.dir</name>
            <value>file:/usr/local/hadoop/datanode_dir</value>
        </property>
        <property>
            <name>dfs.replication</name>
            <value>3</value>
        </property>
    </configuration>
修改mapred-site.xml(复制mapred-site.xml.template,再修改文件名)
    # cp mapred-site.xml.template mapred-site.xml
        <property>
            <name>mapreduce.framework.name</name>
            <value>yarn</value>
        </property>
修改yarn-site.xml
    <configuration>

    <!-- Site specific YARN configuration properties -->
          <property>
              <name>yarn.nodemanager.aux-services</name>
              <value>mapreduce_shuffle</value>
          </property>
          <property>
              <name>yarn.resourcemanager.hostname</name>
              <value>master</value>
          </property>
    </configuration>

保存配置好的容器

 $ sudo docker commit d83136b16ff1 hadoop
当前镜像与容器状况
    $ sudo docker images -a
    REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
    hadoop              latest              9dad1900a237        42 seconds ago      925MB
    first               latest              05633ba1567b        2 hours ago         576MB
    ubuntu              latest              c9d990395902        5 days ago          113MB
    $ sudo docker ps
    CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES
    d83136b16ff1        first               "/bin/bash"         About an hour ago   Up About an hour                        hadooptext

利用配置好的镜像启动创建Hadoop集群

 首先启动三个终端
    $ sudo docker run -it -h master --name master hadoop
    [sudo] password for spc: 
     * Starting OpenBSD Secure Shell server sshd                                                                                                [ OK ] 
    root@master:/#  

    $ sudo docker run -it -h slave01 --name slave01 hadoop
    [sudo] password for spc: 
     * Starting OpenBSD Secure Shell server sshd                             [ OK ] 
    root@slave01:/# 

    $ sudo docker run -it -h slave02 --name slave02 hadoop
    [sudo] password for spc: 
     * Starting OpenBSD Secure Shell server sshd                             [ OK ] 
    root@slave02:/# 

修改/etc/hosts
root@master:/# vim /etc/hosts
127.0.0.1   localhost
::1 localhost ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
172.17.0.2  master
172.17.0.3  slave01
172.17.0.4  slave02
使用ssh检查master是否可以连接slave01和slave02
    root@master:/# ssh slave01
    The authenticity of host 'slave01 (172.17.0.3)' can't be established.
    ECDSA key fingerprint is SHA256:
    Are you sure you want to continue connecting (yes/no)? yes
    Warning: Permanently added 'slave01,172.17.0.3' (ECDSA) to the list of known hosts.
    Welcome to Ubuntu 16.04.4 LTS (GNU/Linux 4.13.0-36-generic x86_64)

     * Documentation:  https://help.ubuntu.com
     * Management:     https://landscape.canonical.com
     * Support:        https://ubuntu.com/advantage

    The programs included with the Ubuntu system are free software;
    the exact distribution terms for each program are described in the
    individual files in /usr/share/doc/*/copyright.

    Ubuntu comes with ABSOLUTELY NO WARRANTY, to the extent permitted by
    applicable law.

     * Starting OpenBSD Secure Shell server sshd                                   [ OK ] 
    root@slave01:~# 
修改master的slaves文件
    进入/usr/local/hadoop-2.7.5/etc/hadoop
    root@master:/usr/local/hadoop-2.7.5/etc/hadoop# vim slaves 
    root@master:/usr/local/hadoop-2.7.5/etc/hadoop# cat slaves 
    slave01
    slave02
启动集群
    root@master:/usr/local/hadoop-2.7.5# cd bin/
    root@master:/usr/local/hadoop-2.7.5/bin# ll
    total 336
    drwxr-xr-x 2 20415 systemd-journal   4096 Dec 16 01:12 ./
    drwxr-xr-x 1 20415 systemd-journal   4096 Dec 16 01:12 ../
    -rwxr-xr-x 1 20415 systemd-journal 111795 Dec 16 01:12 container-executor*
    -rwxr-xr-x 1 20415 systemd-journal   6488 Dec 16 01:12 hadoop*
    -rwxr-xr-x 1 20415 systemd-journal   8514 Dec 16 01:12 hadoop.cmd*
    -rwxr-xr-x 1 20415 systemd-journal  12223 Dec 16 01:12 hdfs*
    -rwxr-xr-x 1 20415 systemd-journal   7238 Dec 16 01:12 hdfs.cmd*
    -rwxr-xr-x 1 20415 systemd-journal   5953 Dec 16 01:12 mapred*
    -rwxr-xr-x 1 20415 systemd-journal   6094 Dec 16 01:12 mapred.cmd*
    -rwxr-xr-x 1 20415 systemd-journal   1776 Dec 16 01:12 rcc*
    -rwxr-xr-x 1 20415 systemd-journal 125103 Dec 16 01:12 test-container-executor*
    -rwxr-xr-x 1 20415 systemd-journal  13352 Dec 16 01:12 yarn*
    -rwxr-xr-x 1 20415 systemd-journal  11054 Dec 16 01:12 yarn.cmd*
    root@master:/usr/local/hadoop-2.7.5/bin# hdfs namenode -format
    -bash: hdfs: command not found
    root@master:/usr/local/hadoop-2.7.5/bin# cd ..
    root@master:/usr/local/hadoop-2.7.5# bin/hdfs namenode -format

    root@master:/usr/local/hadoop-2.7.5/sbin# ./start-all.sh 
    This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
    Starting namenodes on [master]
    The authenticity of host 'master (172.17.0.2)' can't be established.
    ECDSA key fingerprint is SHA256:4Kcq9xSIgyvYhnCk7mptUYNRY1qJlguKDWlA2TtGw6Y.
    Are you sure you want to continue connecting (yes/no)? yes
    master: Warning: Permanently added 'master,172.17.0.2' (ECDSA) to the list of known hosts.
    master: starting namenode, logging to /usr/local/hadoop-2.7.5/logs/hadoop-root-namenode-master.out
    slave01: starting datanode, logging to /usr/local/hadoop-2.7.5/logs/hadoop-root-datanode-slave01.out
    slave02: starting datanode, logging to /usr/local/hadoop-2.7.5/logs/hadoop-root-datanode-slave02.out
    Starting secondary namenodes [0.0.0.0]
    The authenticity of host '0.0.0.0 (0.0.0.0)' can't be established.
    ECDSA key fingerprint is SHA256:4Kcq9xSIgyvYhnCk7mptUYNRY1qJlguKDWlA2TtGw6Y.
    Are you sure you want to continue connecting (yes/no)? yes
    0.0.0.0: Warning: Permanently added '0.0.0.0' (ECDSA) to the list of known hosts.
    0.0.0.0: starting secondarynamenode, logging to /usr/local/hadoop-2.7.5/logs/hadoop-root-secondarynamenode-master.out
    starting yarn daemons
    starting resourcemanager, logging to /usr/local/hadoop-2.7.5/logs/yarn-root-resourcemanager-master.out
    slave02: starting nodemanager, logging to /usr/local/hadoop-2.7.5/logs/yarn-root-nodemanager-slave02.out
    slave01: starting nodemanager, logging to /usr/local/hadoop-2.7.5/logs/yarn-root-nodemanager-slave01.out

在master和slave01和slave02上使用jps查看结果
    root@master:/usr/local/hadoop-2.7.5/sbin# jps
    852 Jps
    261 NameNode
    445 SecondaryNameNode
    589 ResourceManager
    root@master:/usr/local/hadoop-2.7.5/sbin# 

    root@slave01:/# jps
    91 DataNode
    270 Jps
    191 NodeManager
    root@slave01:/# 

    root@slave02:/# jps
    245 Jps
    91 DataNode
    191 NodeManager

出现问题

 root@master:/usr/local/hadoop-2.7.5/bin# dfs ls /user/hadoop/input
-bash: dfs: command not found

修改~/.bashrc
    export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-amd64/
    export HADOOP_HOME=/usr/local/hadoop-2.7.5/bin
    export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME
然后source ~/.bashrc

之后可以使用了
    root@master:/usr/local/hadoop-2.7.5/bin# hdfs dfs
    Usage: hadoop fs [generic options]
        [-appendToFile <localsrc> ... <dst>]
        [-cat [-ignoreCrc] <src> ...]
        [-checksum <src> ...]
        [-chgrp [-R] GROUP PATH...]
        [-chmod [-R] <MODE[,MODE]... | OCTALMODE> PATH...]
        [-chown [-R] [OWNER][:[GROUP]] PATH...]
        [-copyFromLocal [-f] [-p] [-l] <localsrc> ... <dst>]
        [-copyToLocal [-p] [-ignoreCrc] [-crc] <src> ... <localdst>]
        [-count [-q] [-h] <path> ...]
        [-cp [-f] [-p | -p[topax]] <src> ... <dst>]
        [-createSnapshot <snapshotDir> [<snapshotName>]]
        [-deleteSnapshot <snapshotDir> <snapshotName>]
        [-df [-h] [<path> ...]]
        [-du [-s] [-h] <path> ...]
        [-expunge]
        [-find <path> ... <expression> ...]
        [-get [-p] [-ignoreCrc] [-crc] <src> ... <localdst>]
        [-getfacl [-R] <path>]
        [-getfattr [-R] {-n name | -d} [-e en] <path>]
        [-getmerge [-nl] <src> <localdst>]
        [-help [cmd ...]]
        [-ls [-d] [-h] [-R] [<path> ...]]
        [-mkdir [-p] <path> ...]
        [-moveFromLocal <localsrc> ... <dst>]
        [-moveToLocal <src> <localdst>]
        [-mv <src> ... <dst>]
        [-put [-f] [-p] [-l] <localsrc> ... <dst>]
        [-renameSnapshot <snapshotDir> <oldName> <newName>]
        [-rm [-f] [-r|-R] [-skipTrash] <src> ...]
        [-rmdir [--ignore-fail-on-non-empty] <dir> ...]
        [-setfacl [-R] [{-b|-k} {-m|-x <acl_spec>} <path>]|[--set <acl_spec> <path>]]
        [-setfattr {-n name [-v value] | -x name} <path>]
        [-setrep [-R] [-w] <rep> <path> ...]
        [-stat [format] <path> ...]
        [-tail [-f] <file>]
        [-test -[defsz] <path>]
        [-text [-ignoreCrc] <src> ...]
        [-touchz <path> ...]
        [-truncate [-w] <length> <path> ...]
        [-usage [cmd ...]]

    Generic options supported are
    -conf <configuration file>     specify an application configuration file
    -D <property=value>            use value for given property
    -fs <local|namenode:port>      specify a namenode
    -jt <local|resourcemanager:port>    specify a ResourceManager
    -files <comma separated list of files>    specify comma separated files to be copied to the map reduce cluster
    -libjars <comma separated list of jars>    specify comma separated jar files to include in the classpath.
    -archives <comma separated list of archives>    specify comma separated archives to be unarchived on the compute machines.

    The general command line syntax is
    bin/hadoop command [genericOptions] [commandOptions]

    root@master:/usr/local/hadoop-2.7.5/bin# 
命令可以正常使用
    root@master:/usr/local/hadoop-2.7.5/bin# hdfs dfs -put ../etc/hadoop/*.xml /user/hadoop/input
    put: `/user/hadoop/input/capacity-scheduler.xml': File exists
    put: `/user/hadoop/input/core-site.xml': File exists
    put: `/user/hadoop/input/hadoop-policy.xml': File exists
    put: `/user/hadoop/input/hdfs-site.xml': File exists
    put: `/user/hadoop/input/httpfs-site.xml': File exists
    put: `/user/hadoop/input/kms-acls.xml': File exists
    put: `/user/hadoop/input/kms-site.xml': File exists
    put: `/user/hadoop/input/mapred-site.xml': File exists
    put: `/user/hadoop/input/yarn-site.xml': File exists
    root@master:/usr/local/hadoop-2.7.5/bin# 

关闭Hadoop集群

 root@master:/usr/local/hadoop-2.7.5/sbin# ./stop-all.sh 
This script is Deprecated. Instead use stop-dfs.sh and stop-yarn.sh
Stopping namenodes on [master]
master: stopping namenode
slave02: stopping datanode
slave01: stopping datanode
Stopping secondary namenodes [0.0.0.0]
0.0.0.0: stopping secondarynamenode
stopping yarn daemons
stopping resourcemanager
slave01: stopping nodemanager
slave02: stopping nodemanager
slave01: nodemanager did not stop gracefully after 5 seconds: killing with kill -9
slave02: nodemanager did not stop gracefully after 5 seconds: killing with kill -9
no proxyserver to stop

猜你喜欢

转载自blog.csdn.net/superce/article/details/80856666