oracle recovery testing different machines

(A) background to the issue

Recently in a production environment, developers misuse, use oracle database truncate the data to a table of all deleted after the deletion, developers find themselves in trouble, then contact the duty of DBA emergency data recovery.

After analysis, the table is truncate, using a general Flashback Table, Flashback Query, Flashback things and other methods, it is impossible to find the data back, you can use Flashback Database, Flashback data archiving method to recover , but usually in a production environment, it will not open these two characteristics, leaving only the use RMAN for data recovery.

For RMAN for data recovery can be carried out directly on the production environment, it can be restored to the other machines.

  • Directly in the production environment is restored: ① need to stop the production database; ② need to maintain the consistency of the database, for example, I need to restore the database to 12:00, then the other data in the database table will also be restored to 12:00, there may be more missing data; ③ if other problems occur during recovery too much trouble, delayed the production of business execution.
  • Restored to other machines: ② do not need to stop the production library; ② loss of data only truncate table, for example, I need to restore the database to 12:00, then I will simply restore the entire library to 12:00 on the test environment, then we lost the data table or data recovery by DB_LINK pump, etc. into a production environment, production of other data tables are not affected; ③ recovery fail, and will not affect the production library.

So, after some consideration, we decided to restore the database to another machine, then truncate the data guide table back into production.

The recovery operation is a colleague to do, during the recovery process, because the process is not familiar with, check the information delayed some time (about 20 minutes), although the database is restored, but did not meet the requirements of rapid recovery. Thought a moment, if you do it by yourself, whether in the case of developers anxiously waiting, own no panic, complete database recovery fast and stable? Indeed impossible. On the one hand unskilled recovery process, after all, database recovery operations can not meet several times a year, on the other hand in the case of users and developers urged, DBA is also very easy to panic, affecting efficiency. So the best way is to: advance exercises, written operating procedures . When a fault occurs, according to the document operation, with the fastest speed to resume production.

 

(B) Preparing the Environment

  Production Environment Heterogeneous environment
operating system RedHat6.7 RedHat6.7
Database Version 11.2.0.4 (RAC, 2 nodes) 11.2.0.4 (single node)
db_name prodb prodb
instance_name prodb1、prodb2 prodb
Database installation Installation GI + + database software to create a database Installation GI + database software (without creating a database)
Disk Group Information OCR   : 3*1GB,normal
DATA :3*5GB,external
ARCH :  1*5GB,external
OCR   : 3*1GB,normal
DATA :3*5GB,external
ARCH :  1*5GB,external

Note: For convenience, in the subsequent environment, the production environment database referred to as "production database", heterogeneous database environment, referred to as "test library."

 

(C) testing program

image

 

(Iv) detailed implementation process

(4.1) to create a test table

Here creates two test tables, the role are as follows:

lijiaman.test01: to do truncate table testing, and finally test libraries need to restore test01 table.

lijiaman.test02: analog for database transactions, continued insertion operation against the table, so that a large amount of database archive log.

(Ⅰ) Table test01, a total of 14 pieces of data.

SQL> CREATE TABLE test01 AS SELECT * FROM scott.emp;
Table created

SQL> select count(*) from test01;
  COUNT(*)
----------
        14

(ⅠⅠ) Table test02, continuously writing data entered

- Create a table Test02 
Create  Table Test02 
( 
    col1       Number ,     
    col2       Number , 
    col3       VARCHAR2 ( 30 ), 
    COL4 DATE, 
    COL5       VARCHAR2 ( 100 )                            
); 

- creating a random data into the stored procedure 
Create  or  Replace  Procedure p_insert_test02 IS 
the BEGIN 
  the FOR I the IN  . 1 .. 10000 LOOP
   INSERT  INTO test02(col1,col2,col3,col4,col5)
  values
    ((select round(dbms_random.value(1, 100000000)) from dual),
     (select round(dbms_random.value(10000, 100000000)) from dual),
     (select dbms_random.string('a', 25) from dual),
     sysdate,
     (select dbms_random.string('a', 85) fromDual));
   the commit ;
   the END LOOP;
 End   p_insert_test02; 

- develop job, without performing a spacer 30s stored procedure above 
DECLARE 
the jobs that job1 Number ;
 the begin 
sys.dbms_job.submit (Job => the jobs that job1, 
What =>  ' p_insert_test02; ' , 
next_date => SYSDATE, 
interval the =>  ' SYSDATE + 30 / (1440 * 60) ' );                 - every 30s random data is inserted into the pen 10 000 table test02 
the commit ;
 End ;
 /

 

(4.2) a full backup of the database

rman target /

RMAN> run {
allocate channel c1 type disk;
allocate channel c2 type disk;
sql' alter system archive log current';
backup database format '/databaseBackup/full_db_%U';
sql' alter system archive log current';
backup archivelog all format '/databaseBackup/archlog_%U';
backup current controlfile format '/databaseBackup/controlfile_%U';
backup spfile format '/databaseBackup/spfile_%U';
release channel c1;
release channel c2;
}

Generating a backup set as follows:

[oracle@node1 databaseBackup]$ ls -l
total 4136752
-rw-r----- 1 oracle asmadmin 1451128832  Sep 27 19:27 archlog_0iucr7hg_1_1
-rw-r----- 1 oracle asmadmin 1462116352  Sep 27 19:27 archlog_0jucr7hh_1_1
-rw-r----- 1 oracle asmadmin     1406464   Sep 27 19:27 archlog_0kucr7lr_1_1
-rw-r----- 1 oracle asmadmin   18841600   Sep 27 19:28 controlfile_0lucr7m2_1_1
-rw-r----- 1 oracle asmadmin  805953536   Sep 27 19:25 full_db_0eucr7f7_1_1
-rw-r----- 1 oracle asmadmin  477528064   Sep 27 19:25 full_db_0fucr7f7_1_1
-rw-r----- 1 oracle asmadmin   18841600   Sep 27 19:25 full_db_0gucr7h3_1_1
-rw-r----- 1 oracle asmadmin        98304   Sep 27 19:25 full_db_0hucr7ha_1_1
-rw-r ----- 1 oracle asmadmin 98304 Sep 27 19:28 spfile_0mucr7m5_1_1

confirm archive log backup situation can be seen, this full backup archive log backup to thread1: 57, thread2: 48.

RMAN> list archivelog all;

List of Archived Log Copies for database with db_unique_name PRODB
=====================================================================

Key     Thrd  Seq     S  Low Time
------- ---- ------- - ---------
3            1    6       A 24-SEP-19
        Name: +ARCH/prodb/archivelog/2019_09_24/thread_1_seq_6.258.1019832847
......
100        1     57      A 27-SEP-19
        Name: +ARCH/prodb/archivelog/2019_09_27/thread_1_seq_57.355.1020108489

1           2     1       A 24-SEP-19
        Name: +ARCH/prodb/archivelog/2019_09_24/thread_2_seq_1.256.1019830885
......
80         2     48      A 24-SEP-19
        Name: +ARCH/prodb/archivelog/2019_09_24/thread_2_seq_48.335.1019838555

 

(4.3) database running, producing a large archive

Since the log sequence number is incremented (to open the database resetlogs exception), and therefore maximum query logs can be generated on each instance

SELECT *
FROM 
(SELECT thread#,
       SEQUENCE#,
       NAME,
       ROW_NUMBER() OVER(PARTITION BY thread# ORDER BY SEQUENCE# DESC) rn
  FROM V$ARCHIVED_LOG)
WHERE rn=1;

The results are:

image

That is the largest archive log sequence number on thread1 67, the largest archive log sequence number on thread2 48 (opening here because the PC card too many virtual machines, so only open Node 1, Node 2 is no log generation, and It does not affect the accuracy of the results of this experiment).

 

(4.4) Analog test01 table is truncate, note the time

SQL> select sysdate from dual;

SYSDATE
-------------------
2019-09-27 19:37:31

SQL> 
SQL> truncate table test01;

Table truncated.

 

(4.5) database running, producing a large archive

The last backup to the log sequence = 57, and the last backup and then generates a 25 logs, logs generated here is to simulate a large number of production database this transaction.

image

 

(4.6) developers find truncate table data

Developers found a program error, see the table test01, found that the data was no more, the developer confirmed that the data is deleted themselves (assuming).

 

(4.7) DBA performs an exclusive machine recovery

Finishing ideas:

image

本次恢复,需要将test01表恢复到truncate之前,我们需要有执行truncate操作之前的数据库全备和归档备份。第一次全备归档日志文件之备份到了thread1=57,thread2=48,在执行全被之后,又生成了许多的日志文件,我们要将数据库恢复到truncate之前(这里以我们记录的时间2019-09-27 19:37:31 为恢复点),那么我们还需要新的日志来做恢复,需要的日志如下:

thread1:日志57~67肯定需要,日志67~82不一定需要;

thread2:由于节点未开启,不需要日志来做恢复。

 

step1:将生产库的备份集传到测试库

[oracle@node1 databaseBackup]$ scp * 192.168.10.66:/databaseBackup/

step2:对恢复需要的归档日志进行再次备份,得到缺少的归档日志

run {
allocate channel c1 type disk;
sql' alter system archive log current';
backup archivelog all format '/databaseBackup/archlog_20190927_%U';
release channel c1;
}

得到的归档日志备份集如下:

-rw-r----- 1 oracle asmadmin 1621476864 Sep 27 20:50 archlog_20190927_0nucrcd2_1_1
-rw-r----- 1 oracle asmadmin 1643560960 Sep 27 20:51 archlog_20190927_0oucrcg5_1_1
-rw-r----- 1 oracle asmadmin 1581030912 Sep 27 20:53 archlog_20190927_0pucrcjj_1_1

传送到备库上
[oracle@node1 databaseBackup]$ scp archlog_20190927_0* 192.168.10.66:/databaseBackup/
step3:根据生产库的pfile,构造一个测试库的pfile

[oracle@test dbs]$ pwd
/u01/app/oracle/product/11.2.0/db_1/dbs

[oracle@test dbs]$ vim init initprodb.ora
# 添加如下信息
audit_file_dest='/u01/app/oracle/admin/prodb/adump'
audit_trail='db'
compatible='11.2.0.4.0'
control_files='+DATA/prodb/controlfile/current.260.1019830577'
db_block_size=8192
db_create_file_dest='+DATA'
db_domain=''
db_name='prodb'
diagnostic_dest='/u01/app/oracle'
dispatchers='(PROTOCOL=TCP) (SERVICE=prodbXDB)'
enable_ddl_logging=TRUE
log_archive_dest_1='LOCATION=+arch'
log_archive_format='%t_%s_%r.dbf'
open_cursors=300
pga_aggregate_target=399507456
processes=200
remote_login_passwordfile='exclusive'
sessions=225
sga_target=1199570944
prodb.undo_tablespace='UNDOTBS1'

[oracle@test dbs]$ ls
hc_prodb.dat  hc_testdb1.dat  hc_testdb.dat  init.ora  initprodb.ora  lkTESTDB

创建pfile里面涉及到的路径:

[oracle@test ~]$ mkdir -p /u01/app/oracle/admin/prodb/adump

step4:将备库启动到nomount状态

[oracle@test ~]$ export ORACLE_SID=prodb
[oracle@test ~]$ sqlplus / as sysdba

SQL*Plus: Release 11.2.0.4.0 Production on Fri Sep 27 20:58:15 2019

Copyright (c) 1982, 2013, Oracle.  All rights reserved.

Connected to an idle instance.

SQL> startup nomount;
ORACLE instance started.

Total System Global Area 1202556928 bytes
Fixed Size            2252704 bytes
Variable Size          402653280 bytes
Database Buffers      788529152 bytes
Redo Buffers            9121792 bytes
SQL>

step5:将数据库添加到HA中,以便可以使用ASM存储

[oracle@test ~]$ srvctl add database -d prodb -o /u01/app/oracle/product/11.2.0/db_1

step6:恢复控制文件,修改pfile文件,重新启动数据库到mount状态

RMAN> restore controlfile from "/databaseBackup/controlfile_0lucr7m2_1_1";

注意:此时存在一个问题,我们在构造pfile文件的时候,里面填写了控制文件的位置,这个位置是生产库上的位置,我们执行控制文件恢复后,需要对参数文件中的control_files参数进行修改,修改方法如下:

--首先,确认contril file在asm中的位置,

ASMCMD> pwd
+data/prodb/controlfile
ASMCMD> ls -lt
Type         Redund  Striped  Time             Sys  Name
CONTROLFILE  UNPROT  FINE     SEP 27 21:00:00  Y    current.256.1020114329

--接下来,修改pfile文件的control_files参数

[oracle@test ~]$ cd $ORACLE_HOME/dbs
[oracle@test dbs]$ vim initprodb.ora 
# 改control_files位置
control_files='+data/prodb/controlfile/current.256.1020114329'

--重启数据库到mount状态

[oracle@test ~]$ sqlplus / as sysdba

SQL*Plus: Release 11.2.0.4.0 Production on Fri Sep 27 21:17:26 2019

Copyright (c) 1982, 2013, Oracle.  All rights reserved.


Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, Automatic Storage Management, OLAP, Data Mining
and Real Application Testing options

SQL> shutdown immediate
ORA-01507: database not mounted


ORACLE instance shut down.
SQL> startup mount
ORACLE instance started.

Total System Global Area 1202556928 bytes
Fixed Size            2252704 bytes
Variable Size          402653280 bytes
Database Buffers      788529152 bytes
Redo Buffers            9121792 bytes
Database mounted.
SQL>

step7:注册新的归档日志备份集到备库的控制文件中

RMAN> catalog backuppiece "/databaseBackup/archlog_20190927_0nucrcd2_1_1";
RMAN> catalog backuppiece "/databaseBackup/archlog_20190927_0oucrcg5_1_1";
RMAN> catalog backuppiece "/databaseBackup/archlog_20190927_0pucrcjj_1_1";

step8:恢复数据库到truncate之前

RMAN>SQL"ALTER SESSION SET NLS_LANGUAGE=''AMERICAN''";
RMAN>SQL"ALTER SESSION SET NLS_DATE_FORMAT=''YYYY-MM-DD HH24:MI:SS''";
RUN{
SET UNTIL TIME '2019-09-27 19:37:31';
RESTORE DATABASE;
RECOVER DATABASE;
}
step9 : No recovery has been confirmed that the data is returned
--先以只读方式打开数据库,如果有问题,还可以重新执行恢复
SQL> alter database open read only;
Database altered.

--确认数据是否找回来
SQL> select count(*) from lijiaman.test01;
  COUNT(*)
----------
    14

step10:如果没问题,关闭数据库,以resetlogs方式打开

SQL> shutdown immediate;
Database closed.
Database dismounted.
ORACLE instance shut down.

SQL> startup mount
ORACLE instance started.

Total System Global Area 1202556928 bytes
Fixed Size            2252704 bytes
Variable Size          402653280 bytes
Database Buffers      788529152 bytes
Redo Buffers            9121792 bytes
Database mounted.

SQL> alter database open resetlogs;

Database altered.

The restoration is complete.

(4.8) the restored data into the production environment

You may be used expdp / impdp dblink or to import data from the test library produced library.

【Finish】

Guess you like

Origin www.cnblogs.com/lijiaman/p/11577001.html