Million-level data - program migration follow-up

Million-level data-program migration: http://donald-draper.iteye.com/blog/2327909
    In the above article, when the memory is 2G, the number of single-threaded pages is 100,000, and the batches are saved as In the case of 5000, it takes about 20 minutes to update 1.2 million data, and the JVM is fully occupied. Since it was thought that the data update is less once, there is no optimization. Later, the number of records updated at one time reached one million, and the application could not handle it. , now I always find time to do some optimizations. In the past, all pages were processed by a single thread. Now each page is updated through a thread, and each thread obtains a jdbc connection. Note that when the amount of data is too large and the number of pages is small, multiple jdbc connections may need to be established at the same time. To ensure that the maximum number of connections allowed by the database is sufficient, Oracle defaults to 100.
Today, I plan to use a thread to update each page. The main ideas are as follows:
#######Variables to be used
pageUpateSize: number of pages
threadPoolSize: number of threads
batchSize: The number of batches to save

The sums is the number of records that need to be updated. We tested 1.26 million, and the page is updated only when it is greater than 100,000
############################## #Main
thread core code:
ExecutorService exec = null;
int batches = 0;
if( sums > 100000){
	if(sums % pageUpateSize ==0){
		batches = sums/pageUpateSize;
	}
	else{
		batches = sums/pageUpateSize  + 1;
	}
}
AtomicInteger counts = new AtomicInteger(0);//Update the number of records counter
CountDownLatch doneSignal = new CountDownLatch(batches);
exec = Executors.newFixedThreadPool(threadPoolSize);
for(int i =1;i<=batches;i++){
        //getConnection(), get the database connection
	exec.submit(new PageUpdateThread(getConnection(), tableName,(i-1)*pageUpateSize+1,(i)*pageUpateSize,counts,doneSignal));
}
doneSignal.await();//Wait for all paging threads to end
logger.info("============All Insert Sizes:"+counts.get());




Paging update thread:
/**
 * Paging update thread
 * @author donald
 * @date 2017-4-13
 * @time 4:37:07 PM
 */
public class PageUpdateThread implements Runnable {
	private static final Logger log = LoggerFactory.getLogger(PageUpdateThread.class);
	private static int batchSize = 2500;
	private Connection con;
	private String tableName;
	private int startPos;
	private int endPos;
	private final  AtomicInteger totalCount;
	private final CountDownLatch doneSignal;
	private SynService synService = null;
	private String threadName;
	/**
	 *
	 * @param with
	 * @param tableName
	 * @param startPos
	 * @param endPos
	 * @param totalCount
	 * @param doneSignal
	 */
	public PageUpdateAllThread(Connection con, String tableName,
			 int startPos, int endPos,
			AtomicInteger totalCount, CountDownLatch doneSignal) {
		super();
		this.con = con;
		this.startPos = startPos;
		this.endPos = endPos;
		this.totalCount = totalCount;
		this.doneSignal = doneSignal;
	}
	/**
	 *
	 */
	private void init(){
		synService = new SynService ();
		threadName = Thread.currentThread().getName();
	}
	@Override
	public void run() {
		init();
		try {
			log.info(threadName+"Updating records:"+startPos+","+endPos);
			work();
			log.info(threadName+"Update record completed:"+startPos+","+endPos);
		} catch (BatchUpdateException e) {
			e.printStackTrace ();
		} catch (SQLException e) {
			e.printStackTrace ();
		}
		finally{
			doneSignal.countDown();
		}
	}
	/**
	 *
	 * @throws BatchUpdateException
	 * @throws SQLException
	 */
	private void work() throws BatchUpdateException, SQLException{
		ResultSet addRs = null;
		PreparedStatement ps = null;
		List<PageData> insertList = new ArrayList<PageData>();
		// paging statement
		String sql = "SELECT * FROM (SELECT t.*, ROWNUM as rowno FROM ( SELECT * FROM "
				+ tableName
				+ " ORDER BY CREATETIME"
				+ " ) t WHERE ROWNUM <= ?)" + " WHERE rowno >= ?";
		log.info(threadName+"======Search insert records sql:" + sql + ",startPos:"
				+ startPos + ",endPos:" + endPos);
		int counts = 0; // record count
		try {
			ps = con.prepareStatement(sql, ResultSet.TYPE_SCROLL_INSENSITIVE,
					ResultSet.CONCUR_READ_ONLY);
			ps.setInt(1, endPos);
			ps.setInt(2, startPos);
			addRs = ps.executeQuery();
			while (addRs.next()) {
				HashMap dataMap = null;
				dataMap = switch(addRs);//Put the record in the Map
				insertList.add(pd);
				if (counts % batchSize == 0 && counts > 0) {
					long childStartTime = System.currentTimeMillis();
					synService.batchInsertSync(tableName + "Mapper.save", insertList);
					long childEndTime = System.currentTimeMillis();
					log.info(threadName+"Time s to save 2500 records:"
							+ (childEndTime - childStartTime) / 1000.00);
					insertList.clear();
					log.info(threadName+"============Records:" + counts);
				}
				if (addRs.isLast()) {
					synService.batchInsertSync(tableName + "Mapper.save", insertList);
					insertList.clear();
				}

				pd = null;
				counts++;
				totalCount.incrementAndGet();
			}
		} catch (SQLException e) {
			e.printStackTrace ();
		} catch (IOException e) {
			e.printStackTrace ();
		}
		finally {
		        insertList = null;
			sql = null;
			if (addRs != null) {
				addRs.close();
				addRs = null;
			}
			if (ps != null) {
				ps.close();
				ps = null;
			}
			if (con != null) {
				con.close();
			}
		}
	}
}

Let's test:
######################################## The
hardware environment is as follows:
hardware Core i7, 4-core processor, 2G JVM memory, 1.26 million records, database oracle
############################### ##########
JVM virtual machine parameter configuration:
-server
-XX:+UseConcMarkSweepGC
-XX:+PrintGCDetails
-Xloggc:E:\gc.log


During the test, we mainly debug 3 parameters:
pageUpateSize: number of pages
threadPoolSize: number of threads
batchSize: The number of batches to save


The number of threads to be adjusted first:
parameter settings and memory consumption and time used: the
number of threads, the number of pages, the number of batches saved, the maximum memory consumption (G), the time spent (s)
8, 30000, 5000, 1.039, 353.661 Jconsole

memory Usage, Garbage Collections and Times:
Time: 
2017-04-13 14:50:43 Used

 1,023,612 KB
Submitted: 
 1,155,084 KB
Max: 
 2,038,528 KB
GC Time: 
45.212 seconds on ParNew (4,582 collected)
on ConcurrentMarkSweep 0.620 seconds (20 collections)

VisualVM-memory usage graph:




parameter settings and memory consumption and time used: the
number of threads, the number of pages, the number of batch saves, the maximum memory consumption (G), the time-consuming (s)
12, 30000, 5000, 1.645, 254.612 Jconsole

memory usage, garbage collection times and times:
Time: 
2017-04-13 15:09:40 Used

   296,974 KB
Submitted: 
 1,741,000 KB
Max: 
 2,038,528 KB
GC Time: 
42.666 seconds on ParNew (3,767 collections)
0.177 seconds on ConcurrentMarkSweep (15 collections)

VisualVM - Memory Usage Graph:




99 seconds less usage vs. 8 threads, young and old The number and time of garbage collections decreased, but at the same time, the maximum memory usage increased by about 500M.
This is because there are records updated in the thread, the number of threads increases, the corresponding peak memory increases, and the memory usage changes greatly.

Parameter settings and memory consumption and time used: the
number of threads, the number of pages, the number of batches saved, the maximum memory consumption (G), time (s)
16, 30000, 5000, 1.607, 187.303 Jconsole

memory usage, garbage collection Times and Times:
Time: 
2017-04-13 15:23:08 Used

   840,564 KB
Submitted: 
 1,770,688 KB
Max: 
 2,038,528 KB
GC Time: 
29.268 seconds on ParNew (2,560 collected)
0.163 seconds on ConcurrentMarkSweep (13 Gather)

VisualVM - Memory usage graph:



Compared with 12 threads, the usage time is reduced by 67 seconds, and the number and time of garbage collection in the new generation and the old generation are reduced, but at the same time, the maximum memory usage does not change much, and the
number of threads increases.
Section:
When the number of pages and the number of batch saves are the same, the more threads, the less time it takes, and at the same time, the maximum memory consumption increases, and the number and time of
garbage collection in the new generation and the old generation decrease;

Number of paging parameters:
parameter settings and memory consumption and time used: the
number of threads, the number of pages, the number of batches saved, the maximum memory consumption (G), time (s)
8, 30000, 5000, 1.039, 353.661 Jconsole
memory usage Condition, Garbage Collections and Times:
Time: 
2017-04-13 14:50:43 Used

 1,023,612 KB
Submitted: 
 1,155,084 KB
Max: 
 2,038,528 KB
GC Time: 
45.212 seconds on ParNew (4,582 collected)
on ConcurrentMarkSweep 0.620 seconds (20 collections)
based on the above parameter configuration, memory usage, time consumption as the basis for comparison


Parameter settings and memory consumption and time used:
the number of threads, the number of pages, the number of batch saves, the maximum memory consumption (G), the consumption time(s)
8, 20000, 5000, 0.851, 411.734 Jconsole

memory usage, garbage collection times and times:
Time: 
2017-04-13 15:40:23 Used

   202,855 KB
Submitted: 
   893,508 KB
Max: 
 2,038,528 KB
GC Time: 
42.290 seconds on ParNew (4,530 collections)
0.236 seconds (23 collections) on ConcurrentMarkSweep

VisualVM - Memory Usage Graph:



With 8 threads and a batch save of 5000, the number of pages is reduced by 10000, and the peak memory is reduced by 188M ,
but the time spent increased by 58 seconds, the number and time of garbage collection in the new generation and the old generation increased;

parameter settings and memory consumption and time used: the
number of threads, the number of pages, the number of batch saves, the maximum memory consumption (G ), time (s)
8, 10000, 5000, 0.622, 696.168 Jconsole

memory usage, garbage collection times and time:
Time: 
2017-04-13 16:07:15 Used

   398,122 KB
Submitted: 
   622,284 KB
Max: 
 2,038,528 KB
GC Time: 
41.930 seconds on ParNew (4,970 collections)
0.301 seconds (29 collections) on ConcurrentMarkSweep

VisualVM - Memory Usage Graph: At 8 threads and batches with the



same number of threads and batch saves
When the number of saves is 5000, the number of pages is reduced, the peak value of memory is reduced,
but the time spent increases, and the number and time of garbage collection in the new generation and the old generation increase;

then adjust the number of batch saves: the
above parameter configuration, memory usage , the time consumption is used as the comparison basis as the comparison basis
. Parameter settings and memory consumption and time used:
the number of threads, the number of pages, the number of batch saves, the maximum memory consumption (G), the time consumption (s)
8, 10000, 2500, 0.474 , 664.131 Jconsole

memory usage, garbage collections and times:
Time: 
2017-04-13 16:31:18 Used

   256,886 KB
Submitted: 
   535,884 KB
Max: 
 2,038,528 KB
GC Time: 
38.173 sec on ParNew (4,693 sec) collect)
0.380 seconds on ConcurrentMarkSweep (32 collections)

VisualVM - Memory usage graph:



With 8 threads and 10000 pages, the number of batch saves decreases, the peak memory decreases,
but the time spent increases, young and old The number and time of garbage collection in the era increase;


summary:
In the case of sufficient memory, the more threads, the less time it takes, and the number and time of garbage collection in the new generation and the old generation decrease, but at the same time, the greater the peak value of memory, the less time it takes. In this test, 1.26 million data were updated, the peak memory was 1.607G, and the time spent was 187.303s; an average of 6737 records were processed per second. Suppose there are 2 CPUs, the CPU is 8 cores, there are 16 threads in total, and the memory is 8G, which is a conservative estimate. Considering the number of simultaneous records and the number of records saved in batches, the number of records that can be processed per second is (2x8x8)/( 1x4x2)x6737, about 110,000 per second. Judging from the number of records that can be processed at the same time, 187.303s can process 20.57 million data. Of course, the more threads, the better, as long as there is a degree, I have read an article before, the best number of threads is 2xSum(CPU)xCore(CPU); from the test, when the number of threads is Sum(CPU)xCore(CPU) When it is 4 times of the memory, the performance is quite good, but it depends on the specific scene, the benevolent sees the wise and sees the wisdom;
if the memory is not enough, and there is no requirement for the processing time, we can reduce the number of threads, the number of paging and batch saving Number; now there is a service, grandma's ordinary PC, 4G of memory, because there are other databases running on the server, and finally the JVM has only 650M of memory available, because the application service is in Server mode, the default is 1/4 of the memory, but Now there is only 650M. Since the application has no time requirements, the number of threads is 8, the number of pages is 10,000, the number of batches is 2,500, and 1.2 million data is updated. It is acceptable in 10 minutes, and the peak memory is less than 500M. .


Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326297705&siteId=291194637