Batch update method for big data

Batch update method of large amount of data
1. Let me talk about the problem I encountered first, there are 24 million pieces of data in the database user table, and I need to update the information of all users, on the premise that I cannot use update table_user set xxx='?',xxa ='?'; Full table update because the company stipulates that the transaction submitted at one time cannot exceed 50W. Obviously, the transaction data has exceeded 50W and reached 2400W (it takes quite a long time to update the whole table when the data is very large), then I will How to update it?

2. Now let's talk about the method I use to update in batches! Ideas: First query the amount of data that meets the conditions (mine is 24 million), and then decide to make a page every 10,000, and each page is executed by a thread (the project stipulates that only 30 threads can run at the same time, and the remaining 2370 The thread is waiting, this is a good control, otherwise the server will hang if it can't stand it); this execution speed is much faster than the full table update, I have tested that it only takes 30 minutes to update 24 million data, and the fastest full table update takes 60 minutes Minutes, it can't be updated when it is slow.


3. I clearly left out a few important parts in the second step
  • Querying the amount of data is very simple, just count the whole table directly
  • Remember the title of my article, I am updating in batches, and look at my real sql:
  • select t1.sysId from (select t.rowid as sysId,rownum as num from (select rowid from taf_user where status=#status# order by rowid)t)t1 where mod(t1.num,#pageSize#)=0 or t1 .num=1 or t1.num=#recordSum#
    There are three parameters, status is its own business condition, which is not introduced here, the second pageSize is how much data you want to update a thread at a time (10000), the second parameter is The row position of the last data that satisfies the condition (that is, the amount of data from count, mine is 24000000), this query will return a rowid segment (every 10000 is a segment, you can know it by looking at mod, followed by num= 1. The purpose of num=24000000 is to keep the first rowid and the last rowid, because I want to use these two rowids as the start and end, and there is a set of sorting in sql, which guarantees the rowid order), imagine this rowid set The section of index 0 and 1 is exactly the data of 1~10000 rows. When the next thread processes the data of rowidMin and rowidMax from the set index 1 and 2, the section of rowidMin and rowidMax is 10000~20000, then each thread will be given a The number tells him which piece of data to process, so that a batch of (10000) data can be updated quickly with rowid.
  • I didn't mention my update statement above. There are two: >=#rowidMin# and <=#rowidMax# when updating the first paragraph, and >#rowidMin# and <=#rowidMax# when updating the second paragraph. To avoid repeated updates, post them here:
  • update taf_user t set xxx='?',xxa='?' where t.rowid>=#rowidMin# and t.rowid<=#rowidMax# and status='?'

    update taf_user t set xxx='?',xxa='?' where t.rowid>#rowidMin# and t.rowid<=#rowidMax# and status='?'

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326848502&siteId=291194637