A master plan

3 hosts: a master, 2 th slaver / worker

ip addresses docker default allocation:

master:

Host name: hadoop2, ip address: 172.17.0.2

slaver1:

Host name: hadoop3, ip address: 172.17.0.3

Host name: hadoop4, ip address: 172.17.0.4

Second, software installation

1, mounted in the docker centos image, and start centos container mounting ssh. - see "centos mirror mounted on Docker" article.

2, is connected to the through ssh centos container installation jdk1.8, hadoop3.0

It can be installed according to conventional methods linux software installed by the tar package hadoop jdk and uploaded to the host.

Mirroring get centos7

$ docker pull centos

Probably more than 70 M, such as the use of cloud Ali Docker accelerator, then quickly downloaded, after which you can see in the mirror list

View mirror list command:

$ docker images

Install SSH

In centos7 image constructed based on the SSH function with a centos

$ We Dockerfile

content:

FROM centos
MAINTAINER [email protected]

RUN yum install -y openssh-server sudo
RUN sed -i 's/UsePAM yes/UsePAM no/g' /etc/ssh/sshd_config
RUN yum  install -y openssh-clients

RUN echo "root:abc123" | chpasswd
RUN echo "root   ALL=(ALL)       ALL" >> /etc/sudoers
RUN ssh-keygen -t dsa -f /etc/ssh/ssh_host_dsa_key
RUN ssh-keygen -t rsa -f /etc/ssh/ssh_host_rsa_key

RUN mkdir /var/run/sshd
EXPOSE 22
CMD ["/usr/sbin/sshd", "-D"]

The effect of this content is: centos mirror-based, SSH installation of related packages, set the root password is abc123, and start the SSH service

Mirroring a build command, the new image named centos7-ssh

$ docker build -t centos7-ssh .

After the execution is complete, you can see in the mirror list

$ docker images

Construction of Hadoop Mirror

The above operation is centos three containers, each container needs to be installed separately Hadoop environment, as we can construct the same image SSH, Hadoop build a mirror, and then run Hadoop container 3, so that even simpler

$ We Dockerfile

content:

FROM centos7-ssh
ADD jdk-8u151-linux-x64.tar.gz /usr/local/
RUN mv /usr/local/jdk1.8.0_151 /usr/local/jdk1.8
ENV JAVA_HOME /usr/local/jdk1.8
ENV PATH $JAVA_HOME/bin:$PATH

ADD hadoop-3.1.0.tar.gz /usr/local
RUN mv /usr/local/hadoop-3.1.0 /usr/local/hadoop
ENV HADOOP_HOME /usr/local/hadoop
ENV PATH $HADOOP_HOME/bin:$PATH

RUN yum install -y which sudo

This is based on centos7-ssh this image, the JAVA and Hadoop environments are configured the

Premise: Ready jdk-8u101-linux-x64.tar.gz and hadoop-2.7.3.tar.gz in the directory where Dockerfile

Build command execution, the new image named hadoop

$ docker build -t hadoop .

Add 3 hosts in / etc / hosts file corresponding to the host name and address information ip

172.17.0.2      hadoop2
172.17.0.3      hadoop3
172.17.0.4      hadoop4

Directly modify / etc / hosts file docker, the restart will be reset after the container cover. So you need to start --add-host parameter script docker run by container incoming correspondence between the host and ip address, the container will be written after starting hosts file. Such as:

docker run --name hadoop2--add-host hadoop2:172.17.0.2 --add-host hadoop3:172.17.0.3 --add-host hadoop4:172.17.0.4 hadoop

docker exec -it hadoop2 bash

$ ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
$ chmod 0600 ~/.ssh/authorized_keys

hadoop deployment

1. Define working node in the workers file

New workers file etc / hadoop hadoop directory in the root directory, and add the working node host information.

Follow the steps in a host planning, work node and host hadoop3 hadoop4 two hosts. Such as:

[root@9e4ede92e7db ~]# cat /usr/local/hadoop/etc/hadoop/workers
hadoop3
hadoop4

2. Edit Profile Information

a, in hadoop-env.sh added information JAVA_HOME

[root@9e4ede92e7db ~]# cat /usr/local/hadoop/etc/hadoop/hadoop-env.sh |grep JAVA_HOME
#  JAVA_HOME=/usr/java/testing hdfs dfs -ls
# Technically, the only required environment variable is JAVA_HOME.
# export JAVA_HOME=
JAVA_HOME=/usr/local/jdk1.8

b、core-site.xml

configuration><property>
<name>fs.default.name</name>
<value>hdfs://hadoop2:9000</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/tmp</value>
<description>Abase for other temporary directories.</description>
</property>
</configuration>

c, hdfs-site.xml

<configuration>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>hadoop2:9001</value>
<description># 通过web界面来查看HDFS状态 </description>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/home/hadoop/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/home/hadoop/dfs/data</value>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
<Property>
</ Property>
<description> # Block each have two backup </ Description>
<name> dfs.webhdfs.enabled </ name>
<value>true</value>
</property>
</configuration>

d, yarn-site.xml

<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>hadoop2:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>hadoop2:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>hadoop2:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>hadoop2:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>hadoop2:8088</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>1024</value>
</property>
<property>
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>1</value>
</property>
</configuration>

e, mapred-site.xml

<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>hadoop2:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>hadoop2:19888</value>
</property>
</configuration>

f, in order to prevent the intake pit to prepare in advance

we start-dfs.sh we stop-dfs.sh

HDFS_DATANODE_USER=root
#HADOOP_SECURE_DN_USER=hdfs
HDFS_NAMENODE_USER=root
HDFS_SECONDARYNAMENODE_USER=root
HDFS_DATANODE_SECURE_USER=hdfs

we start-yarn.sh we stop-yarn.sh

YARN_RESOURCEMANAGER_USER=root
HADOOP_SECURE_DN_USER=yarn
YARN_NODEMANAGER_USER=root

note:

The above step is stopped after completion of the current container, and use the command holding docker to a new image. Use the new image restart the cluster, the cluster so each machine has the same account, configuration and software, without having to re-configure. Such as:

a, stop the container

docker stop hadoop2

b, save image

docker commit hadoop2 hadoop_me:v1.0

test

1, port mapping

After the cluster is up, need to observe the operation of the web interface of the cluster, and therefore need to be mapped to a port container port on the host mainframe, can be done by the -p option docker run command. such as:

Scheduling the yarn into the host port mapping host port 8088:

docker run -it -p 8088:8088 hadoop_me:v1.0

2, starting from the new image three containers

docker run --name hadoop2 --add-host hadoop2:172.17.0.2 --add-host hadoop3:172.17.0.3 --add-host hadoop4:172.17.0.4 -d -p 5002:22 -p 9870:9870 -p 8088:8088 -p 19888:19888 hadoop_me:v1.0

docker run --name hadoop3 --add-host hadoop2:172.17.0.2 --add-host hadoop3:172.17.0.3 --add-host hadoop4:172.17.0.4 -d -p 5003:22 hadoop_me:v1.0 

docker run --name hadoop4 --add-host hadoop2:172.17.0.2 --add-host hadoop3:172.17.0.3 --add-host hadoop4:172.17.0.4 -d -p 5004:22 hadoop_me:v1.0

3. Formatting
into the / usr / local / hadoop directory
formatting commands

bin / hdfs namenode -format

Hadoop2 in hadoop modify a configuration file etc / hadoop / slaves
delete all the original content, amended as follows

hadoop3
hadoop4

Execute commands in hadoop2

scp  -rq /usr/local/hadoop   hadoop3:/usr/local
scp  -rq /usr/local/hadoop   hadoop4:/usr/local

4. Execute the script starts start-all.sh executed on the cluster hadoop / usr / local under / hadoop directory on the master host: sbin / start-all.sh

5. Access through web pages

1532597096(1).jpg

 

1532597137(1).jpg