Use a docker in front of a mirror that comes with the installation, then this build hadoop cluster using mirrored Ali, also refer to the online portion of a podcast, but more or less there is a problem, I go through all this podcast is practice tested successfully run up.
1, the mirror mounting hadoop
1) pull the mirror
Ali pulled the hadoop mirror
docker pull registry.cn-hangzhou.aliyuncs.com/kaibb/hadoop
view mirror
docker images
3) Create a container hadoop
(1) Create a master node
docker run --name master -d -h master registry.cn-hangzhou.aliyuncs.com/kaibb/hadoop
Parameters:
-h hostname container set
-name Settings Container name
-d run in the background
Creating slave1 and slave2 node (2) in this way
docker run --name slave1 -d -h slave1 registry.cn-hangzhou.aliyuncs.com/kaibb/hadoop
docker run --name slave2 -d -h slave2 registry.cn-hangzhou.aliyuncs.com/kaibb/hadoop
(3) View container
docker ps -s
(4) into the container to view the JDK
docker exec -it master bash
the JDK already comes with
(5) configured to generate ssh keys, all the nodes must be configured
into the container after the
start ssh
/etc/init.d/ssh start
generation keys
* ssh-keygen -t rsa
(5) respectively to each node, the public key of other nodes are also copied to the authorized_keys, which means that each> authorized_keys file stored in the public key is 3 and is the same as
the container file centos copied to the local
- docker cp 容器id/容器名称:/root/.ssh/authorized_keys /home/hadoop/authorized_keys_master
these three files to copy a file
cd /home/hadoop/
cat authorized_keys_master authorized_keys_slave1 authorized_keys_slave2 > authorized_keys
cat authorized_keys
to copy the files to the local container centos
docker cp /home/hadoop/authorized_keys 容器id/容器名称:/root/.ssh/authorized_keys
each node ip address configuration (6), respectively,
into the container, this container can be used directly view the command ip address ip addr
for each container to set the address, vi /etc/hosts
configure
ssh master test, the test is successful
2, configure hadoop (directory profile generally in /opt/tools/hadoop-2.7.2/etc/hadoop/ below)
1) Configuration hadoop-env.sh , arranged jdk
(1) into the container storage location to find hadoop-env.sh
find / -name hadoop-env.sh
(2) View hadoop-env.sh file
export JAVA_HOME=/opt/tools/jdk1.8.0_77
2) Configuration core-site.xml, configure the address and port number hdfs
<property>
<name>fs.defaultFS</name>
<value>hdfs://master:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/hadoop/tmp</value>
</property>
3) Configure hdfs-site.xml, hdfs number of backup configuration, the configuration of the data path namenode and datanode
/ Hadoop / data and / hadoop / name to create this folder in advance
mkdirp -p /hadoop/data
mkdirp -p /hadoop/name
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/hadoop/data</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/hadoop/name</value>
</property>
</configuration>
Greater than or equal to the number of slave backed, not those who will complain
4) arranged mapred-site.xml, designated MapReduce run on yarn, arranged JobTracker address and port.
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:9001</value>
</property>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
5) disposed yarn-site.xml, configure the number of backup hdfs
Configuration parameters
<property>
<name>yarn.resourcemanager.address</name>
<value>master:8032</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>master:8030</value> </property> <property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>master:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>master:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>master:8089</value>
</property>
6) sends these parameters to other nodes
scp /opt/tools/hadoop-2.7.2/etc/hadoop/yarn-site.xml slave1:/opt/tools/hadoop-2.7.2/etc/hadoop/
The the site.xml-Core hadoop-env.sh transmission hdfs-site.xml mapred-site.xml yarn- site.xml slave2 node and to slave1
3, run hadoop
1) Configure slaves
2) on the master format namenode
hadoop namenode -format
3) Start the cluster on the master
cd /opt/tools/hadoop/sbin/
./start-all.sh