大数据生态对每个组成的技术版本有一定要求,如果不是适配版本,则很可能会出现各种问题。像hadoop1.x、2.x、3.x每个大版本都有很大区别,如果基于Hadoop-hdfs去搭建诸如hive数据仓库或者hbase数据库的时候,对版本的选定是优先的。像平常版本的支持,网上也有很多文章指明,但是基于Hadoop2.10.0的可能较为少见,于是作一整理:资源地址(hive,hadoop,zookeeper,hbase,mysql数据库驱动等等):链接: pan.baidu.com/s/1n4wRfi9G…提取码: s8yx链接: https://pan.baidu.com/s/1n4wRfi9G5Ff9yfcKlMdVLg 提取码: s8yx一:Hadoop2.10.0的安装 参考环境:16g内存笔记本(mac pro)虚拟机parallels DeskTop 虚拟机中安装操作系统CentOs7jdk版本:sun的jdk1.8hadoop2.10.0高可用(HA)模式,hive2.3.7单节点,hbase2.2.4集群(未设置备master),zookeeper3.4.14(三节点集群),hive元数据存储至mysqlhadoop集群启动hdfs集群和yarn四个虚拟机centos 前置准备:四台虚拟机分别安装jdk1.8并配置/etc/profile 环境变量 JAVA_HOME 和 path,参考如下export JAVA_HOME=/usr/local/jdk1.8.0_65
export HADOOP_HOME=/home/hadoop/hadoop-2.10.0
export HIVE_HOME=/home/hadoop/apache-hive-2.3.7-bin
export HBASE_HOME=/home/hadoop/hbase-2.2.4
export PATH= J A V A H O M E / b i n : JAVA_HOME/bin: JAVAHOME/bin:PATH: H A D O O P H O M E / b i n : HADOOP_HOME/bin: HADOOPHOME/bin:HADOOP_HOME/sbin: H I V E H O M E / b i n : HIVE_HOME/bin: HIVEHOME/bin:HBASE_HOME/bin
export CLASSPATH=.: J A V A H O M E / l i b / d t . j a r : JAVA_HOME/lib/dt.jar: JAVAHOME/lib/dt.jar:JAVA_HOME/lib/tools.jar
export HADOOP_CLASSPATH= J A V A H O M E / l i b / t o o l s . j a r e x p o r t H A D O O P C O N F D I R = {JAVA_HOME}/lib/tools.jar export HADOOP_CONF_DIR= JAVAHOME/lib/tools.jarexportHADOOPCONFDIR=HADOOP_HOME/etc/hadoop
export HADOOP_COMMON_HOME= H A D O O P H O M E e x p o r t H A D O O P H D F S H O M E = HADOOP_HOME export HADOOP_HDFS_HOME= HADOOPHOMEexportHADOOPHDFSHOME=HADOOP_HOME
export HADOOP_MAPRED_HOME= H A D O O P H O M E e x p o r t H A D O O P Y A R N H O M E = HADOOP_HOME export HADOOP_YARN_HOME= HADOOPHOMEexportHADOOPYARNHOME=HADOOP_HOME
export HADOOP_OPTS="-Djava.library.path= H A D O O P H O M E / l i b / n a t i v e " e x p o r t H A D O O P C O M M O N L I B N A T I V E D I R = HADOOP_HOME/lib/native" export HADOOP_COMMON_LIB_NATIVE_DIR= HADOOPHOME/lib/native"exportHADOOPCOMMONLIBNATIVEDIR=HADOOP_HOME/lib/native复制代码设置节点名称,在/etc/hosts中添加 node01:ip地址,node02:IP地址,node03:ip地址,node04:ip地址,四台机器设置一样,使用scp命令进行分发。 /etc/hosts的配置参考:19.211.55.3 node01
19.211.55.4 node02
19.211.55.5 node03
19.211.55.6 node04复制代码四台虚拟机配置ssh免密登陆(每台都要设置,设置在/root目录下,.ssh/ 即为完成后的…)ssh-keygen
ssh-copy-id -i /root/.ssh/id_rsa.pub node01(节点名称)复制代码四台虚拟机时间同步,同步aliyunyum install ntpdate
ntpdate ntp1.aliyun.com复制代码hadoop2.10.0的tar.gz包上传至虚拟机 /home/hadoop目录下并进行解压(为方便期间,不设置单独用户,全部使用root用户做启动等操作)参考如下:设置 /etc/profile 环境变量,完成后进行 source /etc/profile 并分发至其他节点进行同等操作准备zookeeper集群,在node02,node03,node04设置zookeeper集群,/home/hadoop/zookeeper-3.4.14/conf/zoo.cfg 文件配置参考如下:(/var/zfg/zookeeper 目录下设置myid文件,根据服务名不同,分别设置1,2,3作为zookeeper识别标记)# The number of milliseconds of each tick
tickTime=2000
The number of ticks that the initial
synchronization phase can take
initLimit=10
The number of ticks that can pass between
sending a request and getting an acknowledgement
syncLimit=5
the directory where the snapshot is stored.
do not use /tmp for storage, /tmp here is just
example sakes.
dataDir=/var/zfg/zookeeper
server.1=node02:2888:3888
server.2=node03:2888:3888
server.3=node04:2888:3888
the port at which the clients will connect
clientPort=2181
the maximum number of client connections.
increase this if you need to handle more clients
#maxClientCnxns=60
Be sure to read the maintenance section of the
administrator guide before turning on autopurge.
http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
The number of snapshots to retain in dataDir
#autopurge.snapRetainCount=3
Purge task interval in hours
Set to “0” to disable auto purge feature
#autopurge.purgeInterval=1复制代码hadoop配置文件设置(同时将hdfs和yarn配置文件都写好,便于一次分发),分别对hdfs-site.xml,mapred-site.xml,core-site.xml,yarn-site.xml,slaves等文件做配置添加和修改。设置 hadoop-env.sh 参数,主要设置jdk目录等。hdfs-site.xml 配置文件参考:<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?> dfs.nameservices mycluster dfs.ha.namenodes.mycluster nn1,nn2 dfs.namenode.rpc-address.mycluster.nn1 node01:8020 dfs.namenode.rpc-address.mycluster.nn2 node02:8020 dfs.namenode.http-address.mycluster.nn1 node01:50070 dfs.namenode.http-address.mycluster.nn2 node02:50070 dfs.namenode.shared.edits.dir qjournal://node01:8485;node02:8485;node03:8485/mycluster dfs.journalnode.edits.dir /var/sxt/hadoop/ha/jn dfs.client.failover.proxy.provider.mycluster org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider dfs.ha.fencing.methods sshfence dfs.ha.fencing.ssh.private-key-files /root/.ssh/id_rsa dfs.ha.automatic-failover.enabled true dfs.replication 3 复制代码core-site.xml配置文件参考:<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an “AS IS” BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
–>
<property>
<name>fs.defaultFS</name>
<value>hdfs://mycluster</value>
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>node02:2181,node03:2181,node04:2181</value>
</property>
复制代码yarn-site.xml配置文件参考:<?xml version="1.0"?>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.zk-address</name>
<value>node02:2181,node03:2181,node04:2181</value>
</property>
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>mashibing</value>
</property>
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm1</name>
<value>node03</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm2</name>
<value>node04</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>2048</value>
</property>
<property>
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>1</value>
</property>
<property>
<!-- 客户端通过该地址向RM提交对应用程序操作 -->
<name>yarn.resourcemanager.address.rm1</name>
<value>master:8032</value>
</property> <property>
<!--ResourceManager 对ApplicationMaster暴露的访问地址。ApplicationMaster通过该地址向RM申请资源、释放资源等。 -->
<name>yarn.resourcemanager.scheduler.address.rm1</name>
<value>node03:8030</value>
</property>
<property>
<!-- RM HTTP访问地址,查看集群信息-->
<name>yarn.resourcemanager.webapp.address.rm1</name>
<value>node03:8088</value>
</property>
<property>
<!-- NodeManager通过该地址交换信息 -->
<name>yarn.resourcemanager.resource-tracker.address.rm1</name>
<value>node03:8031</value>
</property>
<property>
<!--管理员通过该地址向RM发送管理命令 -->
<name>yarn.resourcemanager.admin.address.rm1</name>
<value>node03:8033</value>
</property>
<property>
<name>yarn.resourcemanager.ha.admin.address.rm1</name>
<value>node03:23142</value>
</property>
<property>
<name>yarn.resourcemanager.address.rm2</name>
<value>node04:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address.rm2</name>
<value>node04:8030</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address.rm2</name>
<value>node04:8088</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address.rm2</name>
<value>node04:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address.rm2</name>
<value>node04:8033</value>
</property>
<property>
<name>yarn.resourcemanager.ha.admin.address.rm2</name>
<value>node04:23142</value>
</property>
复制代码slaves文件参考:node02
node03
node04复制代码mapred-site.xml文件参考<?xml version="1.0"?>
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an “AS IS” BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
–>
hive.metastore.warehouse.dir
/user/hive/warehouse
javax.jdo.option.ConnectionURL
jdbc:mysql://46.77.56.200:3306/hive?createDatabaseIfNotExist=true
javax.jdo.option.ConnectionDriverName
com.mysql.jdbc.Driver
javax.jdo.option.ConnectionUserName
root
javax.jdo.option.ConnectionPassword
123456
复制代码配置 /etc/profile环境变量,将mysql数据库连接驱动拷贝进hive的lib库中初始化hive元数据存入mysql三:hbase2.2.4集群安装 前置准备: hbase2.2.4的tar.gz包上传至/home/hadoop目录并进行解压配置hbase-site.xml,hbase-env.sh,并将hadoop集群下的配置文件hdfs-site.xml拷贝至hbase的conf目录下。hbase-site.xml参考:<?xml version="1.0"?>