(A) background to the issue
Recently in a production environment, developers misuse, use oracle database truncate the data to a table of all deleted after the deletion, developers find themselves in trouble, then contact the duty of DBA emergency data recovery.
After analysis, the table is truncate, using a general Flashback Table, Flashback Query, Flashback things and other methods, it is impossible to find the data back, you can use Flashback Database, Flashback data archiving method to recover , but usually in a production environment, it will not open these two characteristics, leaving only the use RMAN for data recovery.
For RMAN for data recovery can be carried out directly on the production environment, it can be restored to the other machines.
- Directly in the production environment is restored: ① need to stop the production database; ② need to maintain the consistency of the database, for example, I need to restore the database to 12:00, then the other data in the database table will also be restored to 12:00, there may be more missing data; ③ if other problems occur during recovery too much trouble, delayed the production of business execution.
- Restored to other machines: ② do not need to stop the production library; ② loss of data only truncate table, for example, I need to restore the database to 12:00, then I will simply restore the entire library to 12:00 on the test environment, then we lost the data table or data recovery by DB_LINK pump, etc. into a production environment, production of other data tables are not affected; ③ recovery fail, and will not affect the production library.
So, after some consideration, we decided to restore the database to another machine, then truncate the data guide table back into production.
The recovery operation is a colleague to do, during the recovery process, because the process is not familiar with, check the information delayed some time (about 20 minutes), although the database is restored, but did not meet the requirements of rapid recovery. Thought a moment, if you do it by yourself, whether in the case of developers anxiously waiting, own no panic, complete database recovery fast and stable? Indeed impossible. On the one hand unskilled recovery process, after all, database recovery operations can not meet several times a year, on the other hand in the case of users and developers urged, DBA is also very easy to panic, affecting efficiency. So the best way is to: advance exercises, written operating procedures . When a fault occurs, according to the document operation, with the fastest speed to resume production.
(B) Preparing the Environment
Production Environment | Heterogeneous environment | |
operating system | RedHat6.7 | RedHat6.7 |
Database Version | 11.2.0.4 (RAC, 2 nodes) | 11.2.0.4 (single node) |
db_name | prodb | prodb |
instance_name | prodb1、prodb2 | prodb |
Database installation | Installation GI + + database software to create a database | Installation GI + database software (without creating a database) |
Disk Group Information | OCR : 3*1GB,normal DATA :3*5GB,external ARCH : 1*5GB,external |
OCR : 3*1GB,normal DATA :3*5GB,external ARCH : 1*5GB,external |
Note: For convenience, in the subsequent environment, the production environment database referred to as "production database", heterogeneous database environment, referred to as "test library."
(C) testing program
(Iv) detailed implementation process
(4.1) to create a test table
Here creates two test tables, the role are as follows:
lijiaman.test01: to do truncate table testing, and finally test libraries need to restore test01 table.
lijiaman.test02: analog for database transactions, continued insertion operation against the table, so that a large amount of database archive log.
(Ⅰ) Table test01, a total of 14 pieces of data.
SQL> CREATE TABLE test01 AS SELECT * FROM scott.emp; Table created SQL> select count(*) from test01; COUNT(*) ---------- 14
(ⅠⅠ) Table test02, continuously writing data entered
- Create a table Test02 Create Table Test02 ( col1 Number , col2 Number , col3 VARCHAR2 ( 30 ), COL4 DATE, COL5 VARCHAR2 ( 100 ) ); - creating a random data into the stored procedure Create or Replace Procedure p_insert_test02 IS the BEGIN the FOR I the IN . 1 .. 10000 LOOP INSERT INTO test02(col1,col2,col3,col4,col5) values ((select round(dbms_random.value(1, 100000000)) from dual), (select round(dbms_random.value(10000, 100000000)) from dual), (select dbms_random.string('a', 25) from dual), sysdate, (select dbms_random.string('a', 85) fromDual)); the commit ; the END LOOP; End p_insert_test02; - develop job, without performing a spacer 30s stored procedure above DECLARE the jobs that job1 Number ; the begin sys.dbms_job.submit (Job => the jobs that job1, What => ' p_insert_test02; ' , next_date => SYSDATE, interval the => ' SYSDATE + 30 / (1440 * 60) ' ); - every 30s random data is inserted into the pen 10 000 table test02 the commit ; End ; /
(4.2) a full backup of the database
rman target / RMAN> run { allocate channel c1 type disk; allocate channel c2 type disk; sql' alter system archive log current'; backup database format '/databaseBackup/full_db_%U'; sql' alter system archive log current'; backup archivelog all format '/databaseBackup/archlog_%U'; backup current controlfile format '/databaseBackup/controlfile_%U'; backup spfile format '/databaseBackup/spfile_%U'; release channel c1; release channel c2; }
Generating a backup set as follows:
[oracle@node1 databaseBackup]$ ls -l
total 4136752
-rw-r----- 1 oracle asmadmin 1451128832 Sep 27 19:27 archlog_0iucr7hg_1_1
-rw-r----- 1 oracle asmadmin 1462116352 Sep 27 19:27 archlog_0jucr7hh_1_1
-rw-r----- 1 oracle asmadmin 1406464 Sep 27 19:27 archlog_0kucr7lr_1_1
-rw-r----- 1 oracle asmadmin 18841600 Sep 27 19:28 controlfile_0lucr7m2_1_1
-rw-r----- 1 oracle asmadmin 805953536 Sep 27 19:25 full_db_0eucr7f7_1_1
-rw-r----- 1 oracle asmadmin 477528064 Sep 27 19:25 full_db_0fucr7f7_1_1
-rw-r----- 1 oracle asmadmin 18841600 Sep 27 19:25 full_db_0gucr7h3_1_1
-rw-r----- 1 oracle asmadmin 98304 Sep 27 19:25 full_db_0hucr7ha_1_1
-rw-r ----- 1 oracle asmadmin 98304 Sep 27 19:28 spfile_0mucr7m5_1_1
confirm archive log backup situation can be seen, this full backup archive log backup to thread1: 57, thread2: 48.
RMAN> list archivelog all;
List of Archived Log Copies for database with db_unique_name PRODB
=====================================================================
Key Thrd Seq S Low Time
------- ---- ------- - ---------
3 1 6 A 24-SEP-19
Name: +ARCH/prodb/archivelog/2019_09_24/thread_1_seq_6.258.1019832847
......
100 1 57 A 27-SEP-19
Name: +ARCH/prodb/archivelog/2019_09_27/thread_1_seq_57.355.1020108489
1 2 1 A 24-SEP-19
Name: +ARCH/prodb/archivelog/2019_09_24/thread_2_seq_1.256.1019830885
......
80 2 48 A 24-SEP-19
Name: +ARCH/prodb/archivelog/2019_09_24/thread_2_seq_48.335.1019838555
(4.3) database running, producing a large archive
Since the log sequence number is incremented (to open the database resetlogs exception), and therefore maximum query logs can be generated on each instance
SELECT * FROM (SELECT thread#, SEQUENCE#, NAME, ROW_NUMBER() OVER(PARTITION BY thread# ORDER BY SEQUENCE# DESC) rn FROM V$ARCHIVED_LOG) WHERE rn=1;
The results are:
That is the largest archive log sequence number on thread1 67, the largest archive log sequence number on thread2 48 (opening here because the PC card too many virtual machines, so only open Node 1, Node 2 is no log generation, and It does not affect the accuracy of the results of this experiment).
(4.4) Analog test01 table is truncate, note the time
SQL> select sysdate from dual; SYSDATE ------------------- 2019-09-27 19:37:31 SQL> SQL> truncate table test01; Table truncated.
(4.5) database running, producing a large archive
The last backup to the log sequence = 57, and the last backup and then generates a 25 logs, logs generated here is to simulate a large number of production database this transaction.
(4.6) developers find truncate table data
Developers found a program error, see the table test01, found that the data was no more, the developer confirmed that the data is deleted themselves (assuming).
(4.7) DBA performs an exclusive machine recovery
Finishing ideas:
本次恢复,需要将test01表恢复到truncate之前,我们需要有执行truncate操作之前的数据库全备和归档备份。第一次全备归档日志文件之备份到了thread1=57,thread2=48,在执行全被之后,又生成了许多的日志文件,我们要将数据库恢复到truncate之前(这里以我们记录的时间2019-09-27 19:37:31 为恢复点),那么我们还需要新的日志来做恢复,需要的日志如下:
thread1:日志57~67肯定需要,日志67~82不一定需要;
thread2:由于节点未开启,不需要日志来做恢复。
step1:将生产库的备份集传到测试库
[oracle@node1 databaseBackup]$ scp * 192.168.10.66:/databaseBackup/
step2:对恢复需要的归档日志进行再次备份,得到缺少的归档日志
run { allocate channel c1 type disk; sql' alter system archive log current'; backup archivelog all format '/databaseBackup/archlog_20190927_%U'; release channel c1; }
得到的归档日志备份集如下:
-rw-r----- 1 oracle asmadmin 1621476864 Sep 27 20:50 archlog_20190927_0nucrcd2_1_1
-rw-r----- 1 oracle asmadmin 1643560960 Sep 27 20:51 archlog_20190927_0oucrcg5_1_1
-rw-r----- 1 oracle asmadmin 1581030912 Sep 27 20:53 archlog_20190927_0pucrcjj_1_1
传送到备库上
[oracle@node1 databaseBackup]$ scp archlog_20190927_0* 192.168.10.66:/databaseBackup/
step3:根据生产库的pfile,构造一个测试库的pfile
[oracle@test dbs]$ pwd /u01/app/oracle/product/11.2.0/db_1/dbs [oracle@test dbs]$ vim init initprodb.ora # 添加如下信息 audit_file_dest='/u01/app/oracle/admin/prodb/adump' audit_trail='db' compatible='11.2.0.4.0' control_files='+DATA/prodb/controlfile/current.260.1019830577' db_block_size=8192 db_create_file_dest='+DATA' db_domain='' db_name='prodb' diagnostic_dest='/u01/app/oracle' dispatchers='(PROTOCOL=TCP) (SERVICE=prodbXDB)' enable_ddl_logging=TRUE log_archive_dest_1='LOCATION=+arch' log_archive_format='%t_%s_%r.dbf' open_cursors=300 pga_aggregate_target=399507456 processes=200 remote_login_passwordfile='exclusive' sessions=225 sga_target=1199570944 prodb.undo_tablespace='UNDOTBS1' [oracle@test dbs]$ ls hc_prodb.dat hc_testdb1.dat hc_testdb.dat init.ora initprodb.ora lkTESTDB
创建pfile里面涉及到的路径:
[oracle@test ~]$ mkdir -p /u01/app/oracle/admin/prodb/adump
step4:将备库启动到nomount状态
[oracle@test ~]$ export ORACLE_SID=prodb [oracle@test ~]$ sqlplus / as sysdba SQL*Plus: Release 11.2.0.4.0 Production on Fri Sep 27 20:58:15 2019 Copyright (c) 1982, 2013, Oracle. All rights reserved. Connected to an idle instance. SQL> startup nomount; ORACLE instance started. Total System Global Area 1202556928 bytes Fixed Size 2252704 bytes Variable Size 402653280 bytes Database Buffers 788529152 bytes Redo Buffers 9121792 bytes SQL>
step5:将数据库添加到HA中,以便可以使用ASM存储
[oracle@test ~]$ srvctl add database -d prodb -o /u01/app/oracle/product/11.2.0/db_1
step6:恢复控制文件,修改pfile文件,重新启动数据库到mount状态
RMAN> restore controlfile from "/databaseBackup/controlfile_0lucr7m2_1_1";
注意:此时存在一个问题,我们在构造pfile文件的时候,里面填写了控制文件的位置,这个位置是生产库上的位置,我们执行控制文件恢复后,需要对参数文件中的control_files参数进行修改,修改方法如下:
--首先,确认contril file在asm中的位置,
ASMCMD> pwd +data/prodb/controlfile ASMCMD> ls -lt Type Redund Striped Time Sys Name CONTROLFILE UNPROT FINE SEP 27 21:00:00 Y current.256.1020114329
--接下来,修改pfile文件的control_files参数
[oracle@test ~]$ cd $ORACLE_HOME/dbs [oracle@test dbs]$ vim initprodb.ora # 改control_files位置 control_files='+data/prodb/controlfile/current.256.1020114329'
--重启数据库到mount状态
[oracle@test ~]$ sqlplus / as sysdba SQL*Plus: Release 11.2.0.4.0 Production on Fri Sep 27 21:17:26 2019 Copyright (c) 1982, 2013, Oracle. All rights reserved. Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production With the Partitioning, Automatic Storage Management, OLAP, Data Mining and Real Application Testing options SQL> shutdown immediate ORA-01507: database not mounted ORACLE instance shut down. SQL> startup mount ORACLE instance started. Total System Global Area 1202556928 bytes Fixed Size 2252704 bytes Variable Size 402653280 bytes Database Buffers 788529152 bytes Redo Buffers 9121792 bytes Database mounted. SQL>
step7:注册新的归档日志备份集到备库的控制文件中
RMAN> catalog backuppiece "/databaseBackup/archlog_20190927_0nucrcd2_1_1"; RMAN> catalog backuppiece "/databaseBackup/archlog_20190927_0oucrcg5_1_1"; RMAN> catalog backuppiece "/databaseBackup/archlog_20190927_0pucrcjj_1_1";
step8:恢复数据库到truncate之前
RMAN>SQL"ALTER SESSION SET NLS_LANGUAGE=''AMERICAN''"; RMAN>SQL"ALTER SESSION SET NLS_DATE_FORMAT=''YYYY-MM-DD HH24:MI:SS''"; RUN{ SET UNTIL TIME '2019-09-27 19:37:31'; RESTORE DATABASE; RECOVER DATABASE; }
--先以只读方式打开数据库,如果有问题,还可以重新执行恢复 SQL> alter database open read only; Database altered. --确认数据是否找回来 SQL> select count(*) from lijiaman.test01; COUNT(*) ---------- 14
step10:如果没问题,关闭数据库,以resetlogs方式打开
SQL> shutdown immediate; Database closed. Database dismounted. ORACLE instance shut down. SQL> startup mount ORACLE instance started. Total System Global Area 1202556928 bytes Fixed Size 2252704 bytes Variable Size 402653280 bytes Database Buffers 788529152 bytes Redo Buffers 9121792 bytes Database mounted. SQL> alter database open resetlogs; Database altered.
The restoration is complete.
(4.8) the restored data into the production environment
You may be used expdp / impdp dblink or to import data from the test library produced library.
【Finish】