Fun Big data - how to use a cluster of 14 servers deployed hbase

Security code: Dapeng with the wind, moved nine miles.

First, the environment Introduction

Platform: physical machine
operating system: CentOS 6.5
software version: hadoop-2.5.2, hbase-1.1.2 -bin, jdk-7u79-linux-x64, protobuf-2.5.0, snappy-1.1.1, zookeeper-3.4 .6, hadoop-snappy-0.0.1- SNAPSHOT
software deployment user: hadoop
software placement: / opt / soft
software installation location: / opt / server
software data location: / opt / data
software logs location: / opt / var / logs

CPU name IP addresses Hadoop process
INVOICE-GL-01 10.162.16.6 QuorumPeerMain, HMaster
INVOICE-GL-02 10.162.16.7 QuorumPeerMain, HMaster
INVOICE-GL-03 10.162.16.8 QuorumPeerMain, HMaster
INVOICE-23 10.162.16.227 NameNode, DFSZKFailoverController
INVOICE-24 10.162.16.228 NameNode, DFSZKFailoverController
INVOICE-25 10.162.16.229 JournalNode, DataNode, HRegionServer
INVOICE-26 10.162.16.230 JournalNode, DataNode, HRegionServer
INVOICE-27 10.162.16.231 JournalNode, DataNode, HRegionServer
INVOICE-DN-01 10.162.16.232 DataNode, HRegionServer
INVOICE-DN-02 10.162.16.233 DataNode, HRegionServer
INVOICE-DN-03 10.162.16.234 DataNode, HRegionServer
INVOICE-DN-04 10.162.16.235 DataNode, HRegionServer
INVOICE-DN-05 10.162.16.236 DataNode, HRegionServer
INVOICE-DN-06 10.162.16.237 DataNode, HRegionServer

Second, the installation step

1, turn off the firewall and SELinux
## Sed -i '/ the SELINUX / S / enforcing / Disabled /' / etc / SELinux / config
## the setenforce 0
## OFF the chkconfig iptables
## / etc / the init.d / STOP iptables

2, the time synchronization server
## Vim /etc/ntp.conf
Server the NTP server
The driftfile / var / lib / NTP / Drift
logfile / var / log / NTP
## / etc / the init.d / Start the ntpd

3, all machines modify the hosts hostname
4, all machines hadoop user to create the appropriate folders and
## useradd hadoop
## echo "hadoop" | passwd --stdin hadoop
## mkdir -p / opt / Server
## mkdir -p / opt / Soft
## mkdir -p / opt / Data
## mkdir -p / opt / var / logs
## chown -R & lt Hadoop: Hadoop / opt / Server
## chown -R & lt Hadoop: Hadoop / opt / Soft
## -R & lt Hadoop chown: Hadoop / opt / Data
## chown -R & lt Hadoop: Hadoop / opt / var / logs

5, SSH login-free configuration (INVOICE-GL-01, INVOICE-23, INVOICE-24, INVOICE-25 four management node operation):

su ## - hadoop
## SSH-keygen
##-SSH-Copy the above mentioned id the INVOICE-GL-01
##-SSH-Copy the above mentioned id the INVOICE-GL-02
##-SSH-Copy the above mentioned id the INVOICE-GL-03
## SSH- the above mentioned id the INVOICE-23-Copy
##-SSH-24-Copy-the above mentioned id the INVOICE
.....
## SSH-Copy-the above mentioned id the INVOICE-DN-05
##-SSH-Copy the above mentioned id the INVOICE-DN-06
to ensure that the four hadoop no password can log in to all user machines
such as ssh INVOICE-23 can directly log

################################################## ###################
################## following are hadoop user at ###### ####################
############################## #######################################
6, all the machines in the environment variable configuration:
## vim /home/hadoop/.bash_profile
################ ################## SET Java Home
Export the JAVA_HOME = / opt / Server / jdk1.7.0_79
Export the PATH = $ JAVA_HOME / bin: $ the PATH

###############set zk home##################
export ZOOKEEPER_HOME=/opt/server/zookeeper-3.4.6
export PATH=$ZOOKEEPER_HOME/bin:$PATH

############set hadoop home###################
export HADOOP_HOME=/opt/server/hadoop-2.5.2
export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH 

###############set hbase home################
export HBASE_HOME=/opt/server/hbase-1.1.2
export PATH=$HBASE_HOME/bin:$PATH

##source /home/hadoop/.bash_profile

7, JDK deployment:
## CD / opt / Soft
## -zxvf JDK-7u79 the tar-Linux-x64.tar.gz
## transmitted to the other machine (the machine-GL-01 as the INVOICE)
## SCP -R & lt jdk1 the INVOICE-GL-01 .7.0_79: / opt / Server /
## scp -r jdk1.7.0_79 the INVOICE-GL-02: / opt / Server /
## scp -r jdk1.7.0_79 the INVOICE-GL-03: / opt / Server /
## scp -r jdk1.7.0_79 the INVOICE-23: / opt / Server /
...
## scp -r jdk1.7.0_79 the INVOICE-DN-05: / opt / Server /
## scp -r the INVOICE-DN-06 jdk1.7.0_79: / opt / Server /
## jdk1.7.0_79 mv / opt / Server
## the Java -version

OZSJOO%O9$YUW$1){2AH0(6.png

8、Zookeeper部署:
##cd /opt/soft
##tar -zxvf zookeeper-3.4.6.tar.gz
##cp zookeeper-3.4.6/conf/zoo_sample.cfg zookeeper-3.4.6/conf/zoo.cfg

##vim zookeeper-3.4.6/conf/zoo.cfg
tickTime=2000
initLimit=10
syncLimit=5
dataDir=/opt/data/zookeeper
dataLogDir=/opt/var/logs/zookeeper
clientPort=2181
maxClientCnxns=10000
autopurge.snapRetainCount=3
autopurge.purgeInterval=2
server.1=INVOICE-GL-01:2888:3888
server.2= INVOICE-GL-02:2888:3888
server.3= INVOICE-GL-03:2888:3888

##vim zookeeper-3.4.6/conf/java.env
#!/bin/bash
export JAVA_HOME=/opt/server/jdk1.7.0_79
export JVMFLAGS="-Xms2048m -Xmx10240m $JVMFLAGS"

3.4.6 ZooKeeper Vim-## / bin / zkEnv.sh
(modify the following parameters zookeeper.out storage path)
ZOO_LOG_DIR = "/ opt / var / logs / ZooKeeper"

## is sent to other machines (the unit for the first INVOICE-GL-01 machine)
## the INVOICE SCP -R & lt ZooKeeper-3.4.6-GL-02: / opt / Server /
## SCP--R & lt ZooKeeper 3.4.6 GL-03-the INVOICE: / opt / Server /
## mv ZooKeeper-3.4.6 / opt / Server /
## of the INVOICE-GL-01 machine to create the appropriate folder
## mkdir -p / opt / the Data / ZooKeeper
## -p mkdir / opt / var / logs / ZooKeeper
## echo ". 1"> / opt / Data / ZooKeeper / MyID
## zkServer.sh Start
## INVOICE-GL-02 of the machine created the corresponding folder
## mkdir -p / opt / Data / ZooKeeper
## mkdir -p / opt / var / logs / ZooKeeper
## echo "2"> / opt / Data / ZooKeeper / MyID
## zkServer.sh Start
## of INVOICE-GL-03 machine to create appropriate folder
## mkdir -p / opt / Data / ZooKeeper
## mkdir -p / opt / var / logs / ZooKeeper
## echo ". 3"> / opt / Data / ZooKeeper / MyID
## zkServer.sh Start

9、Hadoop部署:
###所有机器安装hadoop snappy压缩支持
##yum -y install gcc gcc-c++ automake autoconf libtool
##cd /opt/soft/
##tar -zxvf snappy-1.1.1.tar.gz
## cd snappy-1.1.1
##./configure && make && make install
##cd /opt/soft/
##tar -xvf protobuf-2.5.0.tar
##cd protobuf-2.5.0
##./configure && make && make install
##echo "/usr/local/lib" >>/etc/ld.so.conf
##ldconfig

###第INVOICE-25台机器安装hadoop
##cd /opt/soft
##tar -zxvf hadoop-2.5.2.tar.gz
##vim hadoop-2.5.2/etc/hadoop/hadoop-env.sh
export JAVA_HOME=/opt/server/jdk1.7.0_79
export HADOOP_HOME=/opt/server/hadoop-2.5.2
export HADOOP_LOG_DIR=/opt/var/logs/hadoop
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HADOOP_HOME/lib/native/Linux-amd64-64/:/usr/local/lib/

##vim hadoop-2.5.2/etc/hadoop/core-site.xml
<configuration>
<property>
   <name>fs.defaultFS</name>
   <value>hdfs://mycluster</value>
</property>
<property>
   <name>hadoop.tmp.dir</name>
    <value>/opt/data/hadoop</value>
</property>
<property>
   <name>fs.trash.interval</name>
    <value>120</value>
</property>
<property>
   <name>io.compression.codecs</name>
<value>org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.BZip2Codec,org.apache.hadoop.io.compress.SnappyCodec</value></property>
</configuration>

##vim hadoop-2.5.2/etc/hadoop/hdfs-site.xml
<configuration>
<property>
   <name>dfs.nameservices</name>
     <value>mycluster</value>
</property>
<property>
   <name>dfs.ha.namenodes.mycluster</name>
     <value>nn1,nn2</value>
</property>
<property>
   <name>dfs.namenode.rpc-address.mycluster.nn1</name>
     <value>INVOICE-23:8020</value>
</property>
<property>
   <name>dfs.namenode.rpc-address.mycluster.nn2</name>
    <value>INVOICE-24:8020</value>
</property>
<property>
   <name>dfs.namenode.http-address.mycluster.nn1</name>
     <value>INVOICE-23:50070</value>
</property>
<property>
   <name>dfs.namenode.http-address.mycluster.nn2</name>
    <value>INVOICE-24:50070</value>
</property>
<property>
   <name>dfs.namenode.shared.edits.dir</name>
     <value>qjournal://INVOICE-25:8485;INVOICE-26:8485;INVOICE-27:8485/mycluster</value>
</property>
<property>
   <name>dfs.client.failover.proxy.provider.mycluster</name>
   <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<property>
   <name>dfs.ha.fencing.methods</name>
     <value>sshfence</value>
</property>
<property>
   <name>dfs.ha.fencing.ssh.private-key-files</name>
     <value>/home/hadoop/.ssh/id_rsa</value>
</property>
<property>
   <name>dfs.ha.fencing.ssh.connect-timeout</name>
   <value>30000</value>
</property>
<property>
   <name>dfs.journalnode.edits.dir</name>
     <value>/opt/data/journal/local/data</value>
</property>
<property>
    <name>dfs.ha.automatic-failover.enabled</name>
     <value>true</value>
</property>
<property>
    <name>ha.zookeeper.quorum</name>
      <value>INVOICE-GL-01:2181,INVOICE-GL-02:2181,INVOICE-GL-03:2181</value>
</property>
<property>
    <name>dfs.datanode.data.dir</name>
     <value>/opt/data/hadoop1,/opt/data/hadoop2</value>
</property>
<property>
   <name>dfs.replication</name>
   <value>2</value>
</property>
<property>
   <name>dfs.namenode.handler.count</name>
   <value>40</value>
</property>
</configuration>


##vim hadoop-2.5.2/etc/hadoop/slaves
INVOICE-25
INVOICE-26
INVOICE-27
INVOICE-DN-01
INVOICE-DN-02
INVOICE-DN-03
INVOICE-DN-04
INVOICE-DN-05
INVOICE-DN-06

###复制snappylib包到hadoop目录下
##tar -zxvf hadoop-snappy-0.0.1-SNAPSHOT.tar.gz
## cp -r hadoop-snappy-0.0.1-SNAPSHOT/lib/hadoop-snappy-0.0.1-SNAPSHOT.jar hadoop-2.5.2/lib/
## cp -r hadoop-snappy-0.0.1-SNAPSHOT/lib/native/Linux-amd64-64 hadoop-2.5.2/lib/native/

## is sent to other machines (the unit for the first INVOICE-25 machine)
## the INVOICE SCP -R & lt Hadoop-2.5.2-GL-01: / opt / Server /
## SCP--R & lt Hadoop 2.5.2 INVOICE- 02-GL: / opt / Server /
## scp -r hadoop-2.5.2 the INVOICE-GL-03: / opt / Server /
## scp -r hadoop-2.5.2 the INVOICE-23: / opt / Server /
. ..
## 2.5.2-SCP -R & lt Hadoop the INVOICE the DN-05-: / opt / Server /
## SCP the INVOICE-2.5.2--R & lt Hadoop the DN-06: / opt / Server /
##-Hadoop 2.5 Music Videos .2 / opt / server /

All machine ### creates a corresponding folder
mkdir -p / opt / Data / Hadoop
mkdir -p / opt / var / logs / Hadoop
mkdir -p / opt / Data / TECHNOLOGY / local / Data

Hadoop cluster ### initializes
### first INVOICE-25, INVOICE-26, INVOICE-27 station starts journalnode
## hadoop-daemon.sh Start journalnode
### of table format namenode INVOICE-23 and start
## hdfs namenode -format
## hadoop-daemon.sh Start NameNode
### INVOICE-24 of table format NameNode
## NameNode -bootstrapStandby HDFS
### units of INVOICE-23 format zkfc
## zkfc -formatZK HDFS
### of INVOICE -25 station stop and then start the cluster
stop-dfs.sh
start-dfs.sh

添加Yarn服务
# master上配置
# su - hadoop
# cd /opt/server/hadoop-2.5.2/etc/hadoop
# cat ./yarn-site.xml
<?xml version="1.0"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->
<configuration>
 <property>
  <name>yarn.resourcemanager.ha.rm-ids</name>
  <value>nn1,nn2</value>
 </property>
 <property>
  <name>yarn.resourcemanager.hostname.nn1</name>
  <value>INVOICE-23</value>
 </property>
 <property>
  <name>yarn.resourcemanager.hostname.nn2</name>
  <value>INVOICE-24</value>
 </property>
 <property>
  <name>yarn.resourcemanager.ha.id</name>
  <value>nn1</value>       #master2 这里要改成nn2
 </property>
 <property>
  <name>yarn.resourcemanager.address.nn1</name>
  <value>${yarn.resourcemanager.hostname.nn1}:8032</value>
 </property>
 <property>
  <name>yarn.resourcemanager.scheduler.address.nn1</name>
  <value>${yarn.resourcemanager.hostname.nn1}:8030</value>
 </property>
 <property>
  <name>yarn.resourcemanager.webapp.https.address.nn1</name>
  <value>${yarn.resourcemanager.hostname.nn1}:8089</value>
 </property>
 <property>
  <name>yarn.resourcemanager.webapp.address.nn1</name>
  <value>${yarn.resourcemanager.hostname.nn1}:8088</value>
 </property>
 <property>
  <name>yarn.resourcemanager.resource-tracker.address.nn1</name>
  <value>${yarn.resourcemanager.hostname.nn1}:8025</value>
 </property>
 <property>
  <name>yarn.resourcemanager.admin.address.nn1</name>
  <value>${yarn.resourcemanager.hostname.nn1}:8041</value>
 </property>

 <property>
  <name>yarn.resourcemanager.address.nn2</name>
  <value>${yarn.resourcemanager.hostname.nn2}:8032</value>
 </property>
 <property>
  <name>yarn.resourcemanager.scheduler.address.nn2</name>
  <value>${yarn.resourcemanager.hostname.nn2}:8030</value>
 </property>
 <property>
  <name>yarn.resourcemanager.webapp.https.address.nn2</name>
  <value>${yarn.resourcemanager.hostname.nn2}:8089</value>
 </property>
 <property>
  <name>yarn.resourcemanager.webapp.address.nn2</name>
  <value>${yarn.resourcemanager.hostname.nn2}:8088</value>
 </property>
 <property>
  <name>yarn.resourcemanager.resource-tracker.address.nn2</name>
  <value>${yarn.resourcemanager.hostname.nn2}:8025</value>
 </property>
 <property>
  <name>yarn.resourcemanager.admin.address.nn2</name>
  <value>${yarn.resourcemanager.hostname.nn2}:8041</value>
 </property>

 <property>
  <name>yarn.nodemanager.aux-services</name>
  <value>mapreduce_shuffle</value>
 </property>
 <property>
  <name>yarn.resourcemanager.ha.enabled</name>
  <value>true</value>
 </property>
 <property>
  <name>yarn.resourcemanager.ha.automatic-failover.enabled</name>
  <value>true</value>
 </property>
 <property>
  <name>yarn.nodemanager.local-dirs</name>
  <value>/opt/data/hadoop/yarn</value>
 </property>
 <property>
  <name>yarn.nodemanager.log-dirs</name>
  <value>/opt/var/logs/hadoop</value>
 </property>
 <property>
  <name>yarn.client.failover-proxy-provider</name>
  <value>org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider</value>
 </property>
 <property>
  <name>yarn.resourcemanager.zk-state-store.address</name>
  <value>INVOICE-GL-01:2181,INVOICE-GL-02:2181,INVOICE-GL-03:2181</value>
 </property>
 <property>
  <name>yarn.resourcemanager.zk-address</name>
  <value>INVOICE-GL-01:2181,INVOICE-GL-02:2181,INVOICE-GL-03:2181</value>
 </property>
 <property>
  <name>yarn.resourcemanager.store.class</name>
  <value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
 </property>
 <property>
  <name>yarn.resourcemanager.cluster-id</name>
  <value>cluster</value>
 </property>
 <property>
  <name>yarn.resourcemanager.recovery.enabled</name>
  <value>true</value>
 </property>
 <property>
  <name>yarn.resourcemanager.scheduler.class</name>
  <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
 </property>
</configuration>

# cat mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
        <property> 
                <name>mapreduce.framework.name</name> 
                <value>yarn</value> 
         </property>
</configuration>

# start-yarn.sh

10、Hbase部署:
###第INVOICE-25台机器安装hbase
##cd /opt/soft
##tar -zxvf hbase-1.1.2-bin.tar.gz
##vim hbase-1.1.2/conf/hbase-env.sh
export JAVA_HOME=/opt/server/jdk1.7.0_79
export HADOOP_HOME=/home/hadoop/server/hadoop-2.5.2
export HBASE_HOME=/home/hadoop/server/hbase-1.1.2
export HBASE_MANAGES_ZK=false
export HBASE_LOG_DIR=/opt/var/logs/hbase
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HADOOP_HOME/lib/native/Linux-amd64-64/:/usr/local/lib/
export HBASE_LIBRARY_PATH=$HBASE_LIBRARY_PATH:$HBASE_HOME/lib/native/Linux-amd64-64/:/usr/local/lib/
export CLASSPATH=$CLASSPATH:$HBASE_LIBRARY_PATH

## vim hbase-1.1.2/conf/hbase-site.xml
<configuration>
<property>
        <name>hbase.rootdir</name>
        <value>hdfs://mycluster/hbase</value>
</property>
<property>
        <name>hbase.cluster.distributed</name>
        <value>true</value>
</property>
<property>
        <name>hbase.zookeeper.quorum</name>
        <value>INVOICE-GL-01,INVOICE-GL-02,INVOICE-GL-03</value>
</property>
<property>
 <name>hbase.regionserver.codecs</name>
 <value>snappy</value>
</property>
<property>
       <name>hbase.regionserver.handler.count</name>
        <value>500</value>
</property>
<property>
 <name>dfs.support.append</name>
 <value>true</value>
</property>
<property>
        <name>zookeeper.session.timeout</name>
        <value>60000</value>
</property>
<property>
        <name>hbase.master.distributed.log.splitting</name>
        <value>false</value>
</property>
<property>
       <name>hbase.rpc.timeout</name>
       <value>600000</value>
</property>
<property>
       <name>hbase.client.scanner.timeout.period</name>
       <value>60000</value>
</property>
<property>
       <name>hbase.snapshot.master.timeoutMillis</name>
       <value>600000</value>
</property>
<property>
       <name>hbase.snapshot.region.timeout</name>
       <value>600000</value>
</property>
<property>
        <name>hbase.hregion.max.filesize</name>
        <value>107374182400</value>
</property>
<property>
 <name>hbase.master.maxclockskew</name>
        <value>180000</value>
</property>
<property>
      <name>hbase.master.maxclockskew</name>
        <value>180000</value>
</property>
<property>
        <name>hbase.zookeeper.property.maxClientCnxns</name>
       <value>10000</value>
</property>
<property>
  <name>hbase.regionserver.region.split.policy</name>
  <value>org.apache.hadoop.hbase.regionserver.ConstantSizeRegionSplitPolicy</value>
</property>
</configuration>

##vim hbase-1.1.2/conf/regionservers
INVOICE-25
INVOICE-26
INVOICE-27
INVOICE-DN-01
INVOICE-DN-02
INVOICE-DN-03
INVOICE-DN-04
INVOICE-DN-05
INVOICE-DN-06

##vim hbase-1.1.2/conf/backup-masters
INVOICE-GL-02
INVOICE-GL-03

## LN -s /opt/server/hadoop-2.5.2/etc/hadoop/hdfs-site.xml /opt/server/hbase-1.1.2/conf/

## is sent to other machines (the unit for the first INVOICE-25 machine)
## the INVOICE SCP -R & lt HBase-1.1.2-GL-01: / opt / Server /
## SCP--R & lt HBase 1.1.2 INVOICE- 02-GL: / opt / Server /
## scp -r HBase the INVOICE-1.1.2-GL-03: / opt / Server /
## scp -r HBase the INVOICE-1.1.2-23: / opt / Server /
. ..
## 1.1.2 scp -r HBase-the INVOICE-DN-05: / opt / Server /
## scp -r HBase the INVOICE-1.1.2-DN-06: / opt / Server /

All ### machines create the appropriate folders
mkdir -p / opt / var / logs / hbase

HBase ### starts (the unit for the first INVOICE-GL-01 machine)
start-hbase.sh

11, basic maintenance order:
(first INVOICE-GL-01, INVOICE- GL-02, INVOICE-GL-03 station is responsible for starting and stopping, just look at the state of other machines)
ZooKeeper start, status, stop:
the INVOICE-GL-01, INVOICE-GL-02, INVOICE- GL-03 Taiwan: zkServer.sh start, status, stop
check zookeeper operating states:
three stations have a leader, two follower as normal.

hadoop start, stop:
the first stage INVOICE-25:
Start: start-dfs.sh
Stop: stop-dfs.sh
Status: HDFS the fsck /
  HDFS dfsadmin -report

hbase start, stop:
the first stage INVOICE-GL-01:
Start: start-hbase.sh
stop: stop-hbase.sh
status: hbase shell

R 5] (22 (@ 5B0DN {XQYJ0R7TJ.png

Guess you like

Origin blog.51cto.com/yw666/2486120