Ideas run batch processing tasks

background

Compliance requirements in the database of sensitive user information desensitization, the central database account and an account center has expressly phone number.

Solutions

See two parts, stock data and incremental data, wherein the first incremental data processing.
Incremental data, encryption and decryption can be achieved by Getter, Setter. In addition Dao (Repository) may contain findByPhone queries, need to be adjusted according to the ciphertext for the first query, if the result is empty, according to the plaintext and then query again.
Stock data, the database needs to be encrypted in stock plaintext phone number, because encryption is a CPU-intensive operation, the database is not suitable to handle the mental and physical, so write Java regular tasks to run batches.

The first edition achieve

Timing task thread as the main thread, the main thread if only encrypted phone number, a batch of 1000 PO, each PO get out of a plain text, and then set back, setter responsible for encryption, the last save.
Such a batch 1000 PO, found consuming 5s. I tried a batch 10000 PO, the measured time-consuming 50s, visible single-threaded processing speed is constant.
Be simple arithmetic, account development environment center has ten million data, all need to run down the 10000s, equivalent to 2.78 hours. Production data is the development environment several times, such efficiency certainly not.

The second edition of realization

The second edition made two improvements:

  1. Multithreading. Phone number encryption is CPU-intensive operations, our machines have a great surplus of CPU resources, so use Executors to create a fixed-size thread pool, is only responsible for encryption, is not responsible for the insertion and other IO operations.
  2. Bulk insert. Open the transaction, the transaction must be submitted to interact with the database, and will take up a certain amount of database resources. This scene a short time a large number of write operations, can be placed in the same transaction, reduce expenses submitted open transaction. This is achieved by Future.isDone () is determined after completion of the current batch desensitization, a one-time batch record keeping.

The implementation, in order to reduce bulk insert began a failure to roll back the loss, also made a slice, is to record a batch of as fairly allocated to each thread, partakers piece will think of work-steal (Java's implementation is ForkJoinPool), head full of Sao operation, not as good as last FixedThreadPool simple and efficient.

Thread pool size

Created out of Executors FixedThreadPool, coreSize and maxSize is the same. 4 single line machine core CPU, since the UAT is a multi-service deployment, there are eight single-core CPU, each environment is different, it is necessary to dynamically set maxSize, where maxSize set Runtime.getRuntime (). AvailableProcessors () * 3 .

How much data a batch processing

FixedThreadPool with LinkedBlockingQueue an unbounded queue, we need to carefully assess memory footprint. Innodb a record occupancy 222b, came to the JVM will be even greater, assuming 250b by desc table found.
Runtime.getRuntime.freeMemory execution on the local development machine (), // todo. Visible memory is sufficient, a lot can handle a lot of data, but the query ** pieces of data, written to the database will not bring pressure, the doubt, the final choice of a batch run 30000 data.

Guess you like

Origin www.cnblogs.com/mougg/p/12572756.html