Oracle's practice of deleting large amounts of data

1. Introduction It
  never occurred to me that deleting data in a table would be a problem. However, when the amount of data in the table is very large, deleting data can become a big problem.
  Here is a brief introduction to a small problem I encountered and the solution process, only the process is discussed, and the writing method of SQL and stored procedures is not involved. The method is very simple, the master bypasses.

2. Scenario
  The production database has been running for more than a year, with about 50,000 pieces of data per hour, and the total data volume exceeds 400 million. Today, I found that there was a problem with the program for a week, and the data for a whole week was repeated. Duplicate data needs to be removed.

Third, the solution process
  (1)
  The SQL to delete duplicate data is very simple, just use ROWID to exclude it. The first method that comes to mind is to use a SQL to solve the problem. Write it out quickly and verify it successfully on the test library.
  Open SQL Plus and execute delete SQL. As a result, it has been executed for more than a day, and the execution has not been completed, and because the time is too long, the connection between SQL Plus and the server has been disconnected.
  I don't know if the connection timeout caused the SQL not to be executed, or if the connection was disconnected after the execution was completed, the transaction was not committed and rolled back. I specifically checked how to control the connection daze time. ORACLE has a parameter that can be controlled, IDLE_TIME, but the information found says that the connection will not be disconnected during long queries (it turns out that this is not the case, it is a bit strange, is it the execution time? Very long deletion is not the same as long query
  (2)
  The next method that comes to mind is to first find the ROWID of the duplicate data record, save it to a temporary table, and then use the stored procedure to delete it according to this table, and submit data once every 10,000 records to prevent too many UNDOs Data (Oracle does not support hints that avoid generating UNDO logs, it should be for security reasons), improve efficiency, and avoid connection timeouts and transaction rollbacks.
  The result is also executed for more than a day, there is no result, the connection timed out.
  (3)
  The information output by dbms_output.put_line in the process is displayed only after the execution of the stored procedure is completed, and a table is specially built to record the debugging information during the execution of the stored procedure. It is found that it takes a long time to find and insert the ROWID of the duplicate data into the temporary table.
  Reduce the scope of finding duplicate data, adjust it to find one hour of data, and use the method of bulk collect into (forall) to delete. Tested on the production library and executed in about 5 minutes with good results.
  Write another stored procedure and execute the stored procedure that is deleted by the hour in turn. That is, the process of deleting by hour is packaged into a large stored procedure and called at one time. The result is still not successful, the connection timed out.
  (4)
  It seems that this connection timeout is really a problem. It may be that the operation fails due to the connection timeout. But I have a commit operation in each small stored procedure, so it should also be able to complete the deletion of some data? In fact no data is deleted. I didn't figure this out.
  In the end, I had to use a simple and rude method:
  exec delete_by_hour('2016101400');
  exec delete_by_hour('2016101401');
  exec delete_by_hour('2016101402');
  ...
  Write a lot of such calling procedures, copy them to the clipboard, and then Glued into SQL Plus, these commands obediently executed one after the other, and finally deleted all the data smoothly, which took 13 hours in total.
  Of course, you can also save the above content in a text file. In SQL Plus, use @filename to execute the command file.

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326616043&siteId=291194637