Ambari upgrade failed rollback repair (database rollback+downgrade+bugfix)

1. Background introduction

I have been using the 2.2.18 version of Ambari before. The author failed to upgrade due to improper operation during an attempt to upgrade. After several twists and turns, the rollback operation was finally realized. In view of the fact that there are few and scattered information about ambari's upgrade failure rollback and database rollback, the author summarized this article and shared it with you. If you encounter an upgrade failure, please read this article carefully, you should be able to find a solution.

2. Misoperation

  1. Upgrade from 2.2 to 2.4 requires separate upgrade of ambari-server/agent/grafana/ambari metrics collector and other components, and the order must be in accordance with the requirements of the official documentation. Starting the ambari server before upgrading will cause unpredictable errors.
  2. If the backgroud options of the ambari-server web UI is stuck, there must be an error and heartbeat waiting. Remember not to send a large number of pauses to terminate the operation. The correct way is to check the relevant logs under /var/log. After troubleshooting, the option will naturally continue.
  3. Do not perform ambari-server reset without backing up the database , otherwise you will lose the existing cluster and return to the ambari installation interface.
  4. Be sure to back up the database ambari and ambarirca before all upgrade operations. At the same time, be sure to back up the /etc/ambari-server/conf/ambari.properties file. The author's recommended operation is to back up the entire conf folder .
    1.备份ambari数据库
    pg_dump -U postgres ambari > /usr/local/ambari_bak/ambari_rec.sql
    2.备份ambarirca数据库
    pg_dump -U postgres ambarirca > /usr/local/ambari_bak/ambarirca_rec.sql
    3.备份ambari-server的conf目录
    cp -R /etc/ambari-server/conf /usr/local/ambari_bak

3. Key bug-fix records

  • Wrong upgrade or downgrade causes the server version recorded in the database schema to be inconsistent with the real version of the server. Before executing ambari-server upgrade, you must check whether the server version information in the database is consistent with the real information.
Current database store version is not compatible with current server version

Analysis: The version of the database here may be higher or lower than the current server version. This is generally a problem encountered during the upgrade. At this time, the database version is lower than the current server version. This is generally the case for online materials. At this time, ambari-server upgrade should be executed. But what I encountered was an error that occurred during the downgrade, which manifested as the database version higher than the current server version. In this case, you can only reinstall the database or reinitialize the database.

  • Report that the service metainfo.xml under the common-services folder of the agent node cannot be found
Caused by: org.apache.ambari.server.AmbariException: Stack Definition Service at '/var/lib/ambari-server/resources/common-services/PXF/3.0.0/metainfo.xml' doesn't contain a metainfo.xml file

Analysis: This error is generally caused by incorrect reinstallation or incomplete uninstallation of ambari-agent. Need to delete the /var/lib/ambari-agent directory of all agent nodes, and execute yum -y remove ambari-agent on all nodes

  • After part of the ambari upgrade, the agent cannot start, the agent log is as follows
('INFO 2013-03-06 10:37:42,580 NetUtil.py:58 - Failed to connect to https://localhost:8440/cert/ca due to [Errno 111] Connection refused
INFO 2013-03-06 10:37:42,580 NetUtil.py:77 - Server at https://localhost:8440 is not reachable,

Analysis: There are many opinions about this error on the Internet, some are upgrading openssl, some are upgrading oracle JDK, and some are modifying it, but the author tried to no avail. Generally, this error will be caused after manually installing ambari-agent or upgrading some versions of ambari. The correct way is to modify the ambari-agent.ini file under /etc/ambari-agent/conf. It can be verified according to the following process:

1.这里的hostname一定是全部agent节点都填写主节点(ambari-server)节点的hostname
[server]
hostname=bdp01.szmg.com.cn
url_port=8440
secured_url_port=8441


2.如果无法解决,可尝试下面策略(增加protocol信息)
[security]
keysdir=/var/lib/ambari-agent/keys
server_crt=ca.crt
passphrase_env_var_name=AMBARI_PASSPHRASE
#下面这行是增加的
force_https_protocol=PROTOCOL_TLSv1_2

  • When ambari-server is started, it reports that various check database field types do not match, the object type is wrong, etc...

Solution: rollback database

 

4. Database rollback (rollback) + service downgrade (downgrade)

If the above operations still cannot solve the problem of upgrade failure, then the last panacea is to roll back all the plus downgrades. This is often caused by dangerous operations. For example, the author encountered an upgrade failure, upgraded without checking the database or even reset the ambari-server, and finally achieved business recovery through the following processes:

  1. ambri-server, agent are all closed, ambari metabase is closed
  2. Backup ambari.properties
  3. The ambari.repo of all nodes is covered by the previous repo file
  4. All versions are downgraded to the pre-upgrade version, including ambari-server, ambari-agent, grafana, ambari-metrics-collector, ambari-metrics-monitor, ambari-metrics-hadoop-sink. The service of the child node needs to be downgraded for all nodes. After downgrading, check the version:
降级命令:
替换repo后 yum update
降级:
yum downgrade ambari-server
yum downgrade ambari-agent
yum downgrade grafana
yum downgrade ambari-metrics-monitor ambari-metrics-hadoop-sink ambari-metrics-collector
主节点:
[root@bdp01 package]# rpm -qa|grep ambari
ambari-metrics-collector-2.2.2.18-1.x86_64
ambari-metrics-monitor-2.2.2.18-1.x86_64
ambari-server-2.2.2.18-1.x86_64
ambari-metrics-hadoop-sink-2.2.2.18-1.x86_64
ambari-agent-2.2.2.18-1.x86_64
子节点:
[root@bdp07 conf]# rpm -qa |grep ambari
ambari-metrics-hadoop-sink-2.2.2.18-1.x86_64
ambari-metrics-monitor-2.2.2.18-1.x86_64
ambari-agent-2.2.2.18-1.x86_64
  1. Reinstall the ambari metabase
1.删除postgresql数据库
yum -y remove postgresql postgresql-libs
2.重装
yum -y install postgresql
3.数据库init
sudo service postgresql initdb 
4.rollback
 CREATE ROLE ambari WITH LOGIN PASSWORD 'bigdata';
 CREATE ROLE mapred WITH LOGIN PASSWORD 'mapred';
 create database ambari;
 create database ambarirca;
 psql -U postgres ambari < /usr/local/ambari_bak/ambari_rec.sql
 psql -U postgres -d ambarirca < /usr/local/ambari_bak/ambarirca_rec.sql

 

Guess you like

Origin blog.csdn.net/weixin_36714575/article/details/84790281