HDP 3.1.4.0 生产环境安装手册

一、背景

Hadoop主要有开源社区版和云托管、厂商发行三大类型,开源社区版安装需要进行非常复杂的配置操作,云托管只能买对应的服务,无法获取到安装包,典型的有aws,azure(其实就是hdp),阿里云,腾讯云。厂商发行国外有CDH、HDP、MapR,国内有华为的FunsionInsight HD,都是基于开源的进行封装或者改造。

所有的发行版中,HDP是不包含商业代码100%开源版本,2018年Cloudera和Hortonworks合并,HDP的下载指向了Cloudera公司的网站,2020年Cloudera推出两家公司合并之后的CDP平台,并且关闭了CDH、HDP开源版本免费下载的通道,要求必须付费订阅才能下载,本手册使用之前下载的镜像安装完全开源版本的HDP3.1.4.0。

1.1. 与CDH的区别

CDH HDP
集群安装 使用Parcel安装方式,安装包的依赖关系CM自行管理 使用Ambarxi进行安装,实际依赖于操作系统的包管理器,例如:在CentOS中实际依赖YUM进行包的安装和管理
集群管理 使用Cloudera Manager进行集群安装和管理,CM是Cloudera私有服务 使用Ambari进行管理和安装,Ambari是Apache的开源项目
Hive、HDFS管理方式 使用Hue作为Hive数据库、HDFS的图形化管理界面 默认没有Hue,但是提供了Zeppelin,可以连接到Hive、Spark、Phoneix;可以集成Hue
元数据管理 Cloudera Navigator(企业版授权) Atlas(开源)
权限控制 Kerberos+Cloudera Navigator Encryption(企业版授权) Kerberos+Ranger(开源)
Phoneix集成 企业版授权 原生集成(开源)
Hive执行引擎 MR2 MR2+LLAP(Hive查询速度更快,提升显著)
集群监控 - Ambari metrics+Grafana

1.2. 组件版本对比

CDH 6.3.2 HDP 3.1.4
Hadoop 3.0.0 3.1.1
HBase 2.1.4 2.0.2
Kafka 2.2.1 2.0.0
Hive 2.1.1 3.1.0
Spark 2.4.0 2.3.2
Sqoop 1.4.7 1.4.7

常用的Hive版本HDP有重大更新,所以选择HDP进行安装。

参照:CDH 6.3.2 Packaging

1.3. 安装节点规划

  • Ambari需要一台单独的服务器,不在集群节点中
  • Ambari可以使用内嵌的PostgreSQL作为数据库
  • 在Ambari服务器上安装Nginx,用于发布Ambari和HDP本地安装源
  • Ambari服务器要能够SSH免密登录到集群中的各个节点

1.4. 安装步骤概略

  1. 配置服务器时间、关闭防火墙、关闭SELinux
  2. 解压缩文件,配置本地安装源
  3. 在Ambari服务器上安装ambari-server安装包
  4. 通过http://[Ambari IP]:8080/访问Ambari管理界面
  5. 在指定节点安装Mysql服务,作为Hive Metaserver的后端数据库;或者在Hive Metaserver安装时选择New Mysql,Ambari会在Hive Metaserver默认自动安装mariadb数据库,并按照配置界面设定的账号密码对数据库进行初始化
  6. 创建HDP集群,选择HDP版本,选择主机
  7. Ambari会把HDP集群各个节点先行安装ambari-agent,然后再安装HDP各个组件
  8. 安装完成后配置NameNode HA

二、准备安装包

2.1. 下载oracle jdk8

从以下路径可以下载到对应的版本

https://www.oracle.com/technetwork/java/javase/downloads/java-archive-javase8-2177648.html

2.2. 下载Ambari的本地安装源

Cloudera已经修改了下载方式,新的安装包需要商业订阅才能下载,手上保留了Ambari 2.7.4.0版本的安装包,包含ambari和hdp的centos7版本的安装源。

发稿时HDP发布了3.1.5,主要是修复了log4j的安全漏洞,详细信息如下,所以手头是HDP3.1.4版本的不用着急升级,毕竟大数据平台都是内网隔离的,可以忽略这个安全漏洞。

HORTONWORKS DATA PLATFORM 3.1.4

安装源名称 安装源文件名
ambari本地安装源 ambari/ambari-2.7.4.0-centos7.tar.gz
HDP本地安装源 hdp/HDP-3.1.4.0-centos7-gpl.tar.gz
HDP-GPL本地安装源 hdp/HDP-GPL-3.1.4.0-centos7-gpl.tar.gz
HDP-UTILS本地安装源 hdp/HDP-UTILS-1.1.0.22-centos7.tar.gz

以上文件解压缩之后,都放置在nginx的/opt/hdp-repos目录下

2.3. 在Ambari服务器上使用Nginx发布本地安装源

2.3.1. 前提条件

  • 假定nginx所在节点的IP是192.168.7.66
  • 我们在nginx所在的节点直接安装ambari-server

2.3.2. 安装nginx

yum -y install nginx

2.3.3. 配置nginx

修改/etc/nginx/conf.d/default.conf文件,修改默认80端口为81,增加autoindex on参数,修改location /的配置:

扫描二维码关注公众号,回复: 16507617 查看本文章
server {
    listen       81;
    server_name  localhost;

    autoindex on;

    location / {
        root   /opt/hdp-repos;
        index  index.html index.htm;
    }
}

/opt/hdp-repos目录结构如下:

/opt/hdp-repos
  |-- ambari
      |-- centos7
          |-- 2.7.4.0-118
              |-- ambari.repo
                  ...
	|-- HDP
	   |-- centos7
	       |-- 3.1.4.0-315
	           |-- hdp.repo
	               ...
  |-- HDP-UTILS
      |-- centos7
          |-- 1.1.0.22
              |-- hdp-utils.repo
                  ...
  |-- HDP-GPL
      |-- centos7
          |-- 3.1.4.0-315
              |-- hdp.gpl.repo
                  ...

2.3.4 重启nginx

systemctl restart nginx

记得把nginx服务设置为自动启动

systemctl enable nginx

2.3.5. 配置所有服务器的ambari yum源

在所有服务器的/etc/yum.repos.d目录下编写ambari.repo文件

[ambari-2.7.4.0]
name=ambari Version - ambari-2.7.4.0
baseurl=http://192.168.7.66:81/ambari/centos7/2.7.4.0-118
gpgcheck=0
enabled=1
priority=1

2.4. Ambari安装HDP时,在Local Repository里面配置以下源路径

Name Base URL
HDP-3.1 http://192.168.7.66:81/HDP/centos7/3.1.4.0-315
HDP-3.1-GPL http://192.168.7.66:81/HDP-GPL/centos7/3.1.4.0-315
HDP-UTILS-1.1.0.22 http://192.168.7.66:81/HDP-UTILS/centos7/1.1.0.22

三、服务器配置

3.1. 配置网络

所有节点逐台配置主机名称

[root@master ~]# hostnamectl set-hostname foo-1.prpq

所有节点修改/etc/hosts文件,文件中包含所有节点的映射关系,ambari-server所在服务器的/etc/hosts文件中不能出现127.0.0.1的地址,否则在数据节点安装ambari-agent的时候会报错访问不了localhost主机

1.1.1.2  master.prpq
1.1.1.1  foo-1.prpq
2.2.2.2  foo-2.prpq
3.3.3.3  foo-3.prpq
4.4.4.4  foo-4.prpq

所有节点逐台修改/etc/sysconfig/network文件

HOSTNAME=foo-1.prpq

检查主机名配置是否正确

[root@master ~]# uname -a

3.2. 所有节点关闭SELinux

修改文件关闭

[root@master ~]# vim /etc/selinux/config
SELINUX=disabled

[root@master ~]# reboot

命令行快速关闭SELinux

[root@master ~]# sed -i 's/SELINUX=enforcing/SELINUX=disabled/g' /etc/selinux/config
[root@master ~]# setenforce 0

3.3. 所有节点关闭防火墙

[root@master ~]# systemctl stop firewalld
[root@master ~]# systemctl disable firewalld
[root@master ~]# iptables -F && iptables -P INPUT ACCEPT && iptables -P FORWARD ACCEPT

3.4. 所有节点安装JDK1.8

[root@master ~]# rpm -ivh jdk-8u231-linux-x64.rpm

配置默认Java环境

[root@master ~]# alterconfig --config java

3.5. 配置时间同步(直接与外网同步)

[root@master ~]# yum -y install ntp && systemctl enable ntpd && systemctl start ntpd

3.6. ambari-server所在节点生成密钥

[root@master ~]# ssh-keygen

下载id_rsa,id_rsa.pub,修改id_rsa.pub文件名称为authorized_keys放到所有数据节点的.ssh目录中。各个数据节点root账号操作如下:

[root@master ~]# cd ~ && mkdir .ssh && chmod 700 .ssh
[root@master ~]# rz -bye
[root@master ~]# chmod 600 authorized_keys

3.7. 配置master节点的yum安装源

使用命令创建ambari.repo配置文件,vim /etc/yum.repo.d/ambari.repo,其中192.168.6.48是ambari所在节点主机名称,根据实际情况进行修改

[Updates-Ambari-2.7.4.0]
name=Ambari-2.7.4.0-Updates
baseurl=http://192.168.6.48/ambari/centos7/2.7.4.0-118
gpgcheck=0
enabled=1
priority=1

刷新缓存

[root@master ~]# yum makecache

四、开始安装HDP平台

4.1. 在master节点安装ambari-server

[root@master ~]# yum install ambari-server

4.2. 初始化ambari-server

https://jdbc.postgresql.org/下载驱动,注意驱动版本。分两步配置ambari-server,首先配置数据库

[root@master ~]# ambari-server setup --jdbc-db=postgres --jdbc-driver=/usr/share/java/postgresql-42.3.7.jar 
Using python  /usr/bin/python
Setup ambari-server
Copying /usr/share/java/postgresql-42.3.7.jar to /var/lib/ambari-server/resources/postgresql-42.3.7.jar
Creating symlink /var/lib/ambari-server/resources/postgresql-42.3.7.jar to /var/lib/ambari-server/resources/postgresql-jdbc.jar
If you are updating existing jdbc driver jar for postgres with postgresql-42.3.7.jar. Please remove the old driver jar, from all hosts. Restarting services that need the driver, will automatically copy the new jar to the hosts.
JDBC driver was successfully initialized.
Ambari Server 'setup' completed successfully.

然后配置Java Home

[root@master ~]# ambari-server setup --java-home=/usr/java/default --enable-lzo-under-gpl-license
Using python  /usr/bin/python
Setup ambari-server
Checking SELinux...
SELinux status is 'disabled'
Customize user account for ambari-server daemon [y/n] (n)? 
Adjusting ambari-server permissions and ownership...
Checking firewall status...
Checking JDK...
WARNING: JAVA_HOME /usr/java/default must be valid on ALL hosts
WARNING: JCE Policy files are required for configuring Kerberos security. If you plan to use Kerberos,please make sure JCE Unlimited Strength Jurisdiction Policy Files are valid on all hosts.
Check JDK version for Ambari Server...
JDK version found: 8
Minimum JDK version is 8 for Ambari. Skipping to setup different JDK for Ambari Server.
Checking GPL software agreement...
GPL License for LZO: https://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html
Enable Ambari Server to download and install GPL Licensed LZO packages [y/n] (n)? 
Completing setup...
Configuring database...
Enter advanced database configuration [y/n] (n)? 
Configuring database...
Default properties detected. Using built-in database.
Configuring ambari database...
Checking PostgreSQL...
Running initdb: This may take up to a minute.
Initializing database ... OK


About to start PostgreSQL
Configuring local database...
Configuring PostgreSQL...
Restarting PostgreSQL
Creating schema and user...
done.
Creating tables...
done.
Extracting system views...
ambari-admin-2.7.4.0.118.jar
....
Ambari repo file doesn't contain latest json url, skipping repoinfos modification
Adjusting ambari-server permissions and ownership...
Ambari Server 'setup' completed successfully.

4.3. 启动ambari-server

[root@master ~]# ambari-server start

查看启动状态

[root@master ~]# ambari-server status

4.4. 通过ambari控制台配置部署大数据集群

在master.prpq本机通过浏览器访问http://master.prpq:8080,默认账号密码admin/admin,安装过程参考Chapter 6. Installing, Configuring, and Deploying a Cluster,特别注意的有以下三点,其他的按照参考文档操作即可。

4.5. 在ambari中配置HDP安装源

安装集群时选择Use Local Repository,CentOS 7的操作系统保留os是redhat7的配置,并配置以下三个地址,其中192.168.199.91的IP为nginx所在服务的master.prpq服务器的内网IP,实际配置以Nginx发布的配置为准,此为参照。

http://192.168.199.91/HDP/centos7/3.1.4.0-315
http://192.168.199.91/HDP-GPL/centos7/3.1.4.0-315
http://192.168.199.91/HDP-UTILS/centos7/1.1.0.22/

在这里插入图片描述

4.6. Host Registration Information配置SSH Private Key

SSH Private Key选择ambari-server服务器上下载的id_rsa文件并上传
在这里插入图片描述

4.6. Hive的配置数据库使用ambari所在的PostgreSQL

在ambari所在节点切换到postgres账号,创建hive数据库、hive用户名,并授权,命令如下:

[root@master ~]# su - postgres

-bash-4.2$ echo "CREATE DATABASE hive;" | psql -U postgres
-bash-4.2$ echo "CREATE USER hive WITH PASSWORD '@Passw0rd';" | psql -U postgres
-bash-4.2$ echo "GRANT ALL PRIVILEGES ON DATABASE hive TO hive;" | psql -U postgres

切换回root账号,修改/var/lib/pgsql/data/pg_hba.conf,增加hive账号能够从其他主机访问的配置

[root@master ~]# vim /var/lib/pgsql/data/pg_hba.conf

local  all  ambari,mapred,hive md5
host  all   ambari,mapred,hive 0.0.0.0/0  md5
host  all   ambari,mapred,hive ::/0 md5

确认/var/lib/pgsql/data/postgresql.conf配置PostgreSQL监听了内网地址而非127.0.0.1

listen_addresses = "*"

重启postgresql使配置生效

[root@master ~]# systemctl restart postgresql

参考:Using Hive with PostgreSQL

五、配置NameNode HA

5.1. 启用NameNode HA配置向导

  1. 进入ambari管理页面,选择HDFS
  2. 在Action菜单中选择Enable NameNode HA,此菜单会启动配置HA的向导

5.2. 设置Nameservice ID

此ID是替代NameNode FQDN的,例如,输入kkxcluster 在这里插入图片描述

5.3. 选择主机

选择NameNode HA节点,如果之前安装过SNameNode,默认会卸载SNameNode

在这里插入图片描述

5.4. 确认配置

在这里插入图片描述

5.5. 创建检查点

在NameNode节点上使用hdfs账户完成配置 在这里插入图片描述

按照提示在NameNode对应的节点上执行命令,使NameNode进入安全模式,命令都执行完毕后,NEXT按钮会变成可点击

[hdfs@node1 ~]$ hdfs dfsadmin -safemode enter
Safe mode is ON

再创建检查点

[hdfs@node1 ~]$ hdfs dfsadmin -saveNamespace
Save namespace successful

5.6. 执行组件配置

创建检查点之后开始执行配置,期间会停止所有服务
在这里插入图片描述

5.7. 初始化JournalNodes

在NameNode节点上执行命令,初始化JournalNodes 在这里插入图片描述

命令执行结果如下:

[hdfs@node1 ~]$ hdfs namenode -initializeSharedEdits
23/08/30 07:06:03 INFO namenode.NameNode: STARTUP_MSG: 
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = node1/172.18.33.31
STARTUP_MSG:   args = [-initializeSharedEdits]
STARTUP_MSG:   version = 3.1.1.3.1.4.0-315
STARTUP_MSG:   classpath = /usr/hdp/3.1.4.0-315/hadoop/conf:/usr/hdp/3.1.4.0-315/hadoop/lib/java...lib/tez.tar.gz
STARTUP_MSG:   build = [email protected]:hortonworks/hadoop.git -r 58d0fd3d8ce58b10149da3c717c45e5e57a60d14; compiled by 'jenkins' on 2019-08-23T05:15Z
STARTUP_MSG:   java = 1.8.0_231
************************************************************/
23/08/30 07:06:03 INFO namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT]
23/08/30 07:06:03 INFO namenode.NameNode: createNameNode [-initializeSharedEdits]
23/08/30 07:06:04 WARN common.Util: Path /hadoop/hdfs/namenode should be specified as a URI in configuration files. Please update hdfs configuration.
23/08/30 07:06:04 WARN common.Util: Path /hadoop/hdfs/namenode should be specified as a URI in configuration files. Please update hdfs configuration.
23/08/30 07:06:04 WARN namenode.FSNamesystem: Only one image storage directory (dfs.namenode.name.dir) configured. Beware of data loss due to lack of redundant storage directories!
23/08/30 07:06:04 WARN namenode.FSNamesystem: Only one namespace edits storage directory (dfs.namenode.edits.dir) configured. Beware of data loss due to lack of redundant storage directories!
23/08/30 07:06:04 WARN common.Util: Path /hadoop/hdfs/namenode should be specified as a URI in configuration files. Please update hdfs configuration.
23/08/30 07:06:04 WARN common.Storage: set restore failed storage to true
23/08/30 07:06:04 INFO namenode.FSEditLog: Edit logging is async:true
23/08/30 07:06:04 INFO namenode.FSNamesystem: KeyProvider: null
23/08/30 07:06:04 INFO namenode.FSNamesystem: Enabling async auditlog
23/08/30 07:06:04 INFO namenode.FSNamesystem: fsLock is fair: false
23/08/30 07:06:04 INFO namenode.FSNamesystem: Detailed lock hold time metrics enabled: false
23/08/30 07:06:04 INFO namenode.FSNamesystem: fsOwner             = hdfs (auth:SIMPLE)
23/08/30 07:06:04 INFO namenode.FSNamesystem: supergroup          = hdfs
23/08/30 07:06:04 INFO namenode.FSNamesystem: isPermissionEnabled = true
23/08/30 07:06:04 INFO namenode.FSNamesystem: Determined nameservice ID: kkxcluster
23/08/30 07:06:04 INFO namenode.FSNamesystem: HA Enabled: true
23/08/30 07:06:04 INFO blockmanagement.HeartbeatManager: Setting heartbeat recheck interval to 30000 since dfs.namenode.stale.datanode.interval is less than dfs.namenode.heartbeat.recheck-interval
23/08/30 07:06:04 INFO common.Util: dfs.datanode.fileio.profiling.sampling.percentage set to 0. Disabling file IO profiling
23/08/30 07:06:04 INFO blockmanagement.DatanodeManager: dfs.block.invalidate.limit: configured=1000, counted=60, effected=1000
23/08/30 07:06:04 INFO blockmanagement.DatanodeManager: dfs.namenode.datanode.registration.ip-hostname-check=true
23/08/30 07:06:04 INFO blockmanagement.BlockManager: dfs.namenode.startup.delay.block.deletion.sec is set to 000:01:00:00.000
23/08/30 07:06:04 INFO blockmanagement.BlockManager: The block deletion will start around 2023 Aug 30 08:06:04
23/08/30 07:06:04 INFO util.GSet: Computing capacity for map BlocksMap
23/08/30 07:06:04 INFO util.GSet: VM type       = 64-bit
23/08/30 07:06:04 INFO util.GSet: 2.0% max memory 1011.3 MB = 20.2 MB
23/08/30 07:06:04 INFO util.GSet: capacity      = 2^21 = 2097152 entries
23/08/30 07:06:04 INFO blockmanagement.BlockManager: dfs.block.access.token.enable = true
23/08/30 07:06:04 INFO blockmanagement.BlockManager: dfs.block.access.key.update.interval=600 min(s), dfs.block.access.token.lifetime=600 min(s), dfs.encrypt.data.transfer.algorithm=null
23/08/30 07:06:04 INFO blockmanagement.BlockManagerSafeMode: dfs.namenode.safemode.threshold-pct = 0.9900000095367432
23/08/30 07:06:04 INFO blockmanagement.BlockManagerSafeMode: dfs.namenode.safemode.min.datanodes = 0
23/08/30 07:06:04 INFO blockmanagement.BlockManagerSafeMode: dfs.namenode.safemode.extension = 30000
23/08/30 07:06:04 INFO blockmanagement.BlockManager: defaultReplication         = 3
23/08/30 07:06:04 INFO blockmanagement.BlockManager: maxReplication             = 50
23/08/30 07:06:04 INFO blockmanagement.BlockManager: minReplication             = 1
23/08/30 07:06:04 INFO blockmanagement.BlockManager: maxReplicationStreams      = 2
23/08/30 07:06:04 INFO blockmanagement.BlockManager: redundancyRecheckInterval  = 3000ms
23/08/30 07:06:04 INFO blockmanagement.BlockManager: encryptDataTransfer        = false
23/08/30 07:06:04 INFO blockmanagement.BlockManager: maxNumBlocksToLog          = 1000
23/08/30 07:06:04 INFO util.GSet: Computing capacity for map INodeMap
23/08/30 07:06:04 INFO util.GSet: VM type       = 64-bit
23/08/30 07:06:04 INFO util.GSet: 1.0% max memory 1011.3 MB = 10.1 MB
23/08/30 07:06:04 INFO util.GSet: capacity      = 2^20 = 1048576 entries
23/08/30 07:06:04 INFO namenode.FSDirectory: ACLs enabled? true
23/08/30 07:06:04 INFO namenode.FSDirectory: POSIX ACL inheritance enabled? true
23/08/30 07:06:04 INFO namenode.FSDirectory: XAttrs enabled? true
23/08/30 07:06:04 INFO namenode.NameNode: Caching file names occurring more than 10 times
23/08/30 07:06:04 INFO snapshot.SnapshotManager: Loaded config captureOpenFiles: false, skipCaptureAccessTimeOnlyChange: false, snapshotDiffAllowSnapRootDescendant: true, maxSnapshotLimit: 65536
23/08/30 07:06:04 INFO snapshot.SnapshotManager: SkipList is disabled
23/08/30 07:06:04 INFO util.GSet: Computing capacity for map cachedBlocks
23/08/30 07:06:04 INFO util.GSet: VM type       = 64-bit
23/08/30 07:06:04 INFO util.GSet: 0.25% max memory 1011.3 MB = 2.5 MB
23/08/30 07:06:04 INFO util.GSet: capacity      = 2^18 = 262144 entries
23/08/30 07:06:04 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.window.num.buckets = 10
23/08/30 07:06:04 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.num.users = 10
23/08/30 07:06:04 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.windows.minutes = 1,5,25
23/08/30 07:06:04 INFO namenode.FSNamesystem: Retry cache on namenode is enabled
23/08/30 07:06:04 INFO namenode.FSNamesystem: Retry cache will use 0.03 of total heap and retry cache entry expiry time is 600000 millis
23/08/30 07:06:04 INFO util.GSet: Computing capacity for map NameNodeRetryCache
23/08/30 07:06:04 INFO util.GSet: VM type       = 64-bit
23/08/30 07:06:04 INFO util.GSet: 0.029999999329447746% max memory 1011.3 MB = 310.7 KB
23/08/30 07:06:04 INFO util.GSet: capacity      = 2^15 = 32768 entries
23/08/30 07:06:04 INFO common.Storage: Lock on /hadoop/hdfs/namenode/in_use.lock acquired by nodename 11764@node1
23/08/30 07:06:04 INFO namenode.FSImage: No edit log streams selected.
23/08/30 07:06:04 INFO namenode.FSImage: Planning to load image: FSImageFile(file=/hadoop/hdfs/namenode/current/fsimage_0000000000000005579, cpktTxId=0000000000000005579)
23/08/30 07:06:04 INFO namenode.FSImageFormatPBINode: Loading 1178 INodes.
23/08/30 07:06:04 INFO namenode.FSImageFormatProtobuf: Loaded FSImage in 0 seconds.
23/08/30 07:06:04 INFO namenode.FSImage: Loaded image for txid 5579 from /hadoop/hdfs/namenode/current/fsimage_0000000000000005579
23/08/30 07:06:04 INFO namenode.FSNamesystem: Need to save fs image? false (staleImage=false, haEnabled=true, isRollingUpgrade=false)
23/08/30 07:06:04 INFO namenode.NameCache: initialized with 2 entries 237 lookups
23/08/30 07:06:04 INFO namenode.FSNamesystem: Finished loading FSImage in 149 msecs
23/08/30 07:06:04 WARN common.Storage: set restore failed storage to true
23/08/30 07:06:04 INFO namenode.FSEditLog: Edit logging is async:true
23/08/30 07:06:05 INFO namenode.FileJournalManager: Recovering unfinalized segments in /hadoop/hdfs/namenode/current
23/08/30 07:06:05 INFO namenode.FileJournalManager: Finalizing edits file /hadoop/hdfs/namenode/current/edits_inprogress_0000000000000005580 -> /hadoop/hdfs/namenode/current/edits_0000000000000005580-000000000000005580
23/08/30 07:06:05 INFO client.QuorumJournalManager: Starting recovery process for unclosed journal segments...
23/08/30 07:06:05 INFO client.QuorumJournalManager: Successfully started new epoch 1
23/08/30 07:06:05 INFO namenode.RedundantEditLogInputStream: Fast-forwarding stream '/hadoop/hdfs/namenode/current/edits_0000000000000005580-0000000000000005580' to transaction ID 5580
23/08/30 07:06:05 INFO namenode.FSEditLog: Started a new log segment at txid 5580
23/08/30 07:06:05 INFO namenode.FSEditLog: Starting log segment at 5580
23/08/30 07:06:05 INFO namenode.FSEditLog: Ending log segment 5580, 5580
23/08/30 07:06:05 INFO namenode.FSEditLog: logSyncAll toSyncToTxId=5580 lastSyncedTxid=5580 mostRecentTxid=5580
23/08/30 07:06:05 INFO namenode.FSEditLog: Done logSyncAll lastWrittenTxId=5580 lastSyncedTxid=5580 mostRecentTxid=5580
23/08/30 07:06:05 INFO namenode.FSEditLog: Number of transactions: 1 Total time for transactions(ms): 0 Number of transactions batched in Syncs: 0 Number of syncs: 1 SyncTimes(ms): 8 
23/08/30 07:06:05 INFO namenode.NameNode: SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at node1/172.18.33.31
************************************************************/

5.8. 启动服务

在这里插入图片描述

5.9. 初始化NameNode HA元数据

在这里插入图片描述

在原NameNode上执行以下命令初始化NameNode自动failover配置

[hdfs@node1 ~]$ hdfs zkfc -formatZK
23/08/30 07:09:08 INFO tools.DFSZKFailoverController: STARTUP_MSG: 
/************************************************************
STARTUP_MSG: Starting DFSZKFailoverController
STARTUP_MSG:   host = node1/172.18.33.31
STARTUP_MSG:   args = [-formatZK]
STARTUP_MSG:   version = 3.1.1.3.1.4.0-315
STARTUP_MSG:   classpath = /usr/hdp/3.1.4.0-315/hadoop/conf:/usr/hdp/3.1.4.0-315/hadoop/lib/java...tez.tar.gz
STARTUP_MSG:   build = [email protected]:hortonworks/hadoop.git -r 58d0fd3d8ce58b10149da3c717c45e5e57a60d14; compiled by 'jenkins' on 2019-08-23T05:15Z
STARTUP_MSG:   java = 1.8.0_231
************************************************************/
23/08/30 07:09:08 INFO tools.DFSZKFailoverController: registered UNIX signal handlers for [TERM, HUP, INT]
23/08/30 07:09:09 INFO tools.DFSZKFailoverController: Failover controller configured for NameNode NameNode at node1.prpq/172.18.33.31:8020
23/08/30 07:09:09 INFO zookeeper.ZooKeeper: Client environment:zookeeper.version=3.4.6-315--1, built on 08/23/2019 04:37 GMT
23/08/30 07:09:09 INFO zookeeper.ZooKeeper: Client environment:host.name=node1
23/08/30 07:09:09 INFO zookeeper.ZooKeeper: Client environment:java.version=1.8.0_231
23/08/30 07:09:09 INFO zookeeper.ZooKeeper: Client environment:java.vendor=Oracle Corporation
23/08/30 07:09:09 INFO zookeeper.ZooKeeper: Client environment:java.home=/usr/java/jdk1.8.0_231-amd64/jre
23/08/30 07:09:09 INFO zookeeper.ZooKeeper: Client environment:java.class.path=/usr/hdp/3.1.4.0-315/hadoop...tez.tar.gz
23/08/30 07:09:09 INFO zookeeper.ZooKeeper: Client environment:java.library.path=:/usr/hdp/3.1.4.0-315/hadoop/lib/native/Linux-amd64-64:/usr/hdp/3.1.4.0-315/hadoop/lib/native/Linux-amd64-64:/usr/hdp/3.1.4.0315/hadoop/lib/native
23/08/30 07:09:09 INFO zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/tmp
23/08/30 07:09:09 INFO zookeeper.ZooKeeper: Client environment:java.compiler=<NA>
23/08/30 07:09:09 INFO zookeeper.ZooKeeper: Client environment:os.name=Linux
23/08/30 07:09:09 INFO zookeeper.ZooKeeper: Client environment:os.arch=amd64
23/08/30 07:09:09 INFO zookeeper.ZooKeeper: Client environment:os.version=3.10.0-1160.95.1.el7.x86_64
23/08/30 07:09:09 INFO zookeeper.ZooKeeper: Client environment:user.name=hdfs
23/08/30 07:09:09 INFO zookeeper.ZooKeeper: Client environment:user.home=/home/hdfs
23/08/30 07:09:09 INFO zookeeper.ZooKeeper: Client environment:user.dir=/home/hdfs
23/08/30 07:09:09 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=node1.prpq:2181,node2.prpq:2181,node3.prpq:2181 sessionTimeout=10000 watcher=org.apache.hadoop.ha.ActiveStandbyElectorWatcherWithClientRef@4135c3b
23/08/30 07:09:09 INFO zookeeper.ClientCnxn: Opening socket connection to server node3/172.18.33.32:2181. Will not attempt to authenticate using SASL (unknown error)
23/08/30 07:09:09 INFO zookeeper.ClientCnxn: Socket connection established, initiating session, client: /172.18.33.31:43750, server: node3/172.18.33.32:2181
23/08/30 07:09:09 INFO zookeeper.ClientCnxn: Session establishment complete on server node3/172.18.33.32:2181, sessionid = 0x38a4543d1ce0001, negotiated timeout = 10000
23/08/30 07:09:09 INFO ha.ActiveStandbyElector: Session connected.
23/08/30 07:09:09 INFO ha.ActiveStandbyElector: Successfully created /hadoop-ha/kkxcluster in ZK.
23/08/30 07:09:09 INFO zookeeper.ZooKeeper: Session: 0x38a4543d1ce0001 closed
23/08/30 07:09:09 INFO zookeeper.ClientCnxn: EventThread shut down
23/08/30 07:09:09 INFO tools.DFSZKFailoverController: SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down DFSZKFailoverController at node1/172.18.33.31
************************************************************/

另一个NameNode上执行以下命令,初始化元数据

[hdfs@node2 ~]$ hdfs namenode -bootstrapStandby
23/08/30 07:09:32 INFO namenode.NameNode: STARTUP_MSG: 
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = node2/172.18.33.23
STARTUP_MSG:   args = [-bootstrapStandby]
STARTUP_MSG:   version = 3.1.1.3.1.4.0-315
STARTUP_MSG:   classpath = /usr/hdp/3.1.4.0-315/hadoop/conf:/usr/hdp/3.1.4.0-315/hadoop/lib/java...tez.tar.gz
STARTUP_MSG:   build = [email protected]:hortonworks/hadoop.git -r 58d0fd3d8ce58b10149da3c717c45e5e57a60d14; compiled by 'jenkins' on 2019-08-23T05:15Z
STARTUP_MSG:   java = 1.8.0_231
************************************************************/
23/08/30 07:09:32 INFO namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT]
23/08/30 07:09:32 INFO namenode.NameNode: createNameNode [-bootstrapStandby]
23/08/30 07:09:32 INFO ha.BootstrapStandby: Found nn: nn1, ipc: node1.prpq/172.18.33.31:8020
23/08/30 07:09:32 WARN common.Util: Path /hadoop/hdfs/namenode should be specified as a URI in configuration files. Please update hdfs configuration.
23/08/30 07:09:32 WARN common.Util: Path /hadoop/hdfs/namenode should be specified as a URI in configuration files. Please update hdfs configuration.
=====================================================
About to bootstrap Standby ID nn2 from:
           Nameservice ID: kkxcluster
        Other Namenode ID: nn1
  Other NN's HTTP address: http://node1.prpq:50070
  Other NN's IPC  address: node1.prpq/172.18.33.31:8020
             Namespace ID: 625879089
            Block pool ID: BP-342188911-172.18.33.31-1693375437449
               Cluster ID: CID-1c90d546-cfc7-4c24-8d2f-b2a4d2be9676
           Layout version: -64
       isUpgradeFinalized: true
=====================================================
23/08/30 07:09:33 INFO common.Storage: Storage directory /hadoop/hdfs/namenode has been successfully formatted.
23/08/30 07:09:33 WARN common.Util: Path /hadoop/hdfs/namenode should be specified as a URI in configuration files. Please update hdfs configuration.
23/08/30 07:09:33 WARN common.Util: Path /hadoop/hdfs/namenode should be specified as a URI in configuration files. Please update hdfs configuration.
23/08/30 07:09:33 WARN common.Storage: set restore failed storage to true
23/08/30 07:09:33 INFO namenode.FSEditLog: Edit logging is async:true
23/08/30 07:09:33 INFO namenode.TransferFsImage: Opening connection to http://node1.prpq:50070/imagetransfer?getimage=1&txid=5579&storageInfo=-64:625879089:1693375437449:CID-1c90d546-cfc7-4c24-8d2f-b2a4d2be676&bootstrapstandby=true
23/08/30 07:09:33 INFO common.Util: Combined time for file download and fsync to all disks took 0.01s. The file download took 0.01s at 17200.00 KB/s. Synchronous (fsync) write to disk of /hadoop/hdfs/namenoe/current/fsimage.ckpt_0000000000000005579 took 0.00s.
23/08/30 07:09:33 INFO namenode.TransferFsImage: Downloaded file fsimage.ckpt_0000000000000005579 size 88668 bytes.
23/08/30 07:09:33 INFO namenode.NameNode: SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at node2/172.18.33.23
************************************************************/

5.10. 完成HA设置

在这里插入图片描述

确定完成

在这里插入图片描述

  1. 官方手册上修改Hive的配置不用执行,在执行完以上配置之后,确认Hive的配置已经修改
[hdfs@node1 ~]$ hive --config /etc/hive/conf/conf.server --service metatool -listFSRoot
+======================================================================+
|                    Error: JAVA_HOME is not set                       |
+----------------------------------------------------------------------+
| Please download the latest Sun JDK from the Sun Java web site        |
|     > http://www.oracle.com/technetwork/java/javase/downloads        |
|                                                                      |
| HBase requires Java 1.8 or later.                                    |
+======================================================================+
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/hdp/3.1.4.0-315/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/3.1.4.0-315/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Listing FS Roots..
hdfs://kkxcluster/warehouse/tablespace/managed/hive/sys.db
hdfs://kkxcluster/warehouse/tablespace/managed/hive
hdfs://kkxcluster/warehouse/tablespace/managed/hive/information_schema.db
hdfs://kkxcluster/apps/spark/warehouse

参考:Enable NameNode high availability

六、集群安装失败时,删除安装软件重新初始化HDP集群

yum erase -y hadoop_* zookeeper* ranger* hbase_* ranger* hbase_* ambari-* hadoop_* zookeeper_* hbase* range* pig*  hive* tez* mysql-* bigtop-*  tuned-* ambari-* apache-maven* postgresql*

七、碰到的问题

7.1. 错误:0 status code received on POST method for API: /api/v1/stacks/HDP/versions/3.1/recommendations

原因是正在访问ambari-web的机器和ambari服务器通信有问题,可能是防火墙、VPN等网络问题导致,提示这个错误,会导致在ambari创建集群的时候如果正确获得ambari推荐的集群配置,导致集群配置失败。如果无法确定本机是什么问题导致出现这个错误提示,可以HDP集群内网中配置一台带界面的服务器,在内网完成集群初始化配置。参考:Solved: Ranger installation fails with '0 status code received on POST method for API: /api/v1/stacks/HDP/versions/3.1/recommendations

7.2. hive-interactive启动后报错:Connection failed on host XXX:10500

HDFS > Configs > Custom core-site.xml修改如下配置:

hadoop.proxyuser.hive.hosts=*

集群安装完毕之后,默认没有启动hive-interactive,而Hive Warehouse Connector (HWC)组件必须依赖hive-interactive,所以我们需要启用他,并且配置HDFS的这个参数。

7.3. Sqoop导入数据到Hive时,卡在Connecting to jdbc:hive2://

参考:sqoop import hung (–hive-import) HDP-3.0.0以及Using hive-site.xml to automatically connect to HiveServer2在sqoop客户端所在的节点/etc/hive/conf目录中新建beeline-hs2-connection.xml文件,写入一下内容:

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
 <property>
   <name>beeline.hs2.connection.user</name>
   <value>hdfs</value>
 </property>
 <property>
   <name>beeline.hs2.connection.password</name>
   <value>hdfs</value>
 </property>
</configuration>

由于我们默认使用hdfs账号进行sqoop、hive等操作,所以这里设置的hdfs账号Spark配置

7.4. Spark执行时报告无法读取hive的meta数据

修改Spark2的metastore.catalog.default=hive

7.5. 按照官方推荐,Spark2使用HWC方式访问Hive3,所以必须配置Custom spark2-defaults

spark.datasource.hive.warehouse.metastoreUri = thrift://cdh-n2:9083

spark.hadoop.hive.llap.daemon.service.hosts = @llap0

spark.hadoop.hive.zookeeper.quorum = cdh-m1:2181,cdh-n1:2181,cdh-n2:2181

spark.sql.hive.hiveserver2.jdbc.url = jdbc:hive2://cdh-m1:2181,cdh-n1:2181,cdh-n2:2181/;password=hdfs;serviceDiscoveryMode=zooKeeper;user=hdfs;zooKeeperNamespace=hiveserver2-interactive

其中metastoreUri的节点是Hive Metabase所在节点;llap.daemon.service.hosts默认是llap0;zookeeper.quorumjdbc.url中的三台主机地址;jdbc.url需要使用interactive地址,并且增加hdfs用户和密码的参数.

参考:Apache Spark-Apache Hive connection configuration

7.6. Zeppelin使用%jdbc(hive)时提示:ZooKeeperHiveClientException: Unable to read HiveServer2 configs from ZooKeeper

检查Hive的Summary界面,看HIVESERVER2 INTERACTIVE是否有告警;或者是否有启用hive-interactive,默认为没有启用。

7.7. hive-interactive没有启动

在Hive的配置页面中选择启动,供Zeppelin的%jdbc(hive)使用。

7.8. Hive metabase的Database连接测试失败

先使用有问题的配置把集群安装完毕,等到集群所有组件安装完毕后,再到Hive的配置页面修改Hive的数据库配置信息,如果Hive能够连接到数据库,并且检测后发现数据里没有Hive的配置数据,Hive会自动创建数据库表并完成表数据的初始化。

7.9. Zeppelin提示账号或密码错误

Login按钮左边的绿点显示为红点,移动鼠标上去,提示Websocket Disconnected,此时执行登录操作会提示账号或密码错误,通过浏览器调试模块可以看到请求返回HTTP 404,待网络正常后可以正常登录。

7.10. Spark-SQL只能看到default数据库

https://community.cloudera.com/t5/Support-Questions/Spark2-shell-is-not-displaying-all-Hive-databases-only/td-p/193774

spark-sql:修改Spark2的metastore.catalog.default=hive

7.11. Spark Thrift Server不能查询Hive的数据,报告:AccessControlException: Permission denied

Hive3的Authorization默认开启,如果要允许查询,需要对Spark用户授权,而不仅仅只看HDFS文件的权限

hadoop fs -setfacl -R -m user:spark:rwx /warehouse/tablespace/managed/hive/ameba.db

参考:https://www.alibabacloud.com/help/zh/doc-detail/62704.htm

7.12. Sqoop同步的数据表,在Spark自带的Hive中正常访问,SparkSQL访问时报告Not a file: hdfs://

Spark编程的时候,增加以下两个设置:

val spark = SparkSession
.builder()
.appName(getClass.getName + reportMonthly)
.config("spark.hive.mapred.supports.subdirectories","true")
.config("spark.hadoop.mapreduce.input.fileinputformat.input.dir.recursive","true")
.enableHiveSupport()
.getOrCreate()

7.13. Spark Thrift Server访问Sqoop同步的数据表,以及Spark写入的分区表,报告Not a file: hdfs://

>在Custom spark2-thrift-sparkconf中添加以下配置:
>
>```properties
>spark.hive.mapred.supports.subdirectories = true
>spark.hadoop.mapreduce.input.fileinputformat.input.dir.recursive = true
>```
>
>使用以上方法虽然可以把Hive数据读取出来,但是会导致数据出现重复的现象,通过HWC可以避免出现此类问题。[Hive Warehouse Connector for accessing Apache Spark data](https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.1.0/integrating-hive/content/hive_hivewarehouseconnector_for_handling_apache_spark_data.html)

7.14. Spark执行时报告AtlasServiceException: Metadata service API org.apache.atlas.AtlasClientV2$API_V2@5dd59141 failed with status 401 (Unauthorized)

>[Failed to initialize Atlas client using spark-atlas-connector](https://community.cloudera.com/t5/Support-Questions/Failed-to-initialize-Atlas-client-using-spark-atlas/td-p/241232)

7.15. Cannot get ACID state for ameba.pro_salary from null

>使用Hive Warehouse Connector进行连接,可以写入Hive数据库。遗留问题 ,数据无法直接写入Hive分区表,需要写临时表,然后再写入分区表。
>
>参考:[Hive Warehouse Connector for accessing Apache Spark data](https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.1.0/integrating-hive/content/hive_hivewarehouseconnector_for_handling_apache_spark_data.html)

7.16. Spark不支持Hive的物化视图scala.MatchError: MATERIALIZED_VIEW (of class org.apache.hadoop.hive.metastore.TableType)

7.17. Spark读取Hive碰到的字段类型转换问题

> 数据库设计时所有的带小数的字段全部设计成decimal类型,在hive.executeQuery时,在查询sql中使用cast函数对字段进行转换,统一转换成decimal类型。

7.18. 使用HWC之后,日期格式转换使用Java的表达式,而非mysql的表达式

八、参考资料

  • Apache Ambari 2.7.4.0 Installation

https://docs.cloudera.com/HDPDocuments/Ambari-2.7.4.0/bk_ambari-installation/content/ch_Getting_Ready.html

  • Cloudera Enterprise 6.3.x Getting Started:

https://docs.cloudera.com/documentation/enterprise/latest/topics/introduction.htmlSpark2提交任务

  cd /usr/hdp/current/spark2-client
  ./bin/spark-submit --class org.apache.spark.examples.SparkPi \
      --master yarn-client \
      --num-executors 1 \
      --driver-memory 512m \
      --executor-memory 512m \
      --executor-cores 1 \
      examples/jars/spark-examples*.jar 10
  • 通过Hive Warehouse Connector集成Spark和Hive

Integrating Apache Hive with Apache Spark - Hive Warehouse Connector

Hive Warehouse Connector for accessing Apache Spark data

  • 安装Ambari

Apache Ambari Installation

  • 创建HDP集群

Installing, Configuring, and Deploying a Cluster

  • Hive LLAP和Presto、MR3等性能对比

Spark 2.3.1 < Presto 0.208e < HDP 3.0.1 LLAP < Hive 3.1.0/MR3[^ Comparison LLAP Presto SparkSQL Tez MR3] < HDP 3.1.4 LLAP[^LLAP vs MR3],结论:LLAP性能是最好的,好于MR3,直接使用Tez并没有获得最好的性能。HDP 2到HDP 3 Hive方面的性能提升巨大[^HDP 2.6 vs HDP 3.0]。

[^Comparison LLAP Presto SparkSQL Tez MR3]: Performance Evaluation of SQL-on-Hadoop Systems using the TPC-DS Benchmark
[^LLAP vs MR3]: Hive Performance: Hive-LLAP in HDP 3.1.4 vs Hive 3/4 on MR3 0.10
[^HDP 2.6 vs HDP 3.0]: 2x Faster BI Interactive queries with HDP 3.0

  • Spark访问Phoenix

Using the Connector with Apache Phoenix

phoenix-spark plugin

猜你喜欢

转载自blog.csdn.net/xieshaohu/article/details/132586673
HDP