Excel spreadsheet data to optimize multi-threaded interface to import parity

The company's needs, the current import an Excel function, the process is: read Excel data, passing a background check every piece of data to determine whether the import requirements, return to the front, import preview show. (Distal waiting for a response, difficult). Users then click the Import button, asynchronous import (front end not wait, better to do). Current data interface supports only 300, now requires me to support the 3000 data.

Problem solving, critical thinking is.

First, look at the interface, find the location of the reading table, see the judgment, if the data is greater than 300, direct return. Put 300 into 3000.

Then, import the data checksum analysis, and what are the data check, data are from the database. Each time from a database query, it certainly is slow. Even Redis query cache, but also a network consumption, increase the cache pressure. Although there are stand-alone Redis query performance 120 000 times / sec, 120 000 3000 was divided by 40, if this play, it caused the collapse of 40 individuals using the system. The same data, have to check 3000, it is not stupid? ? ? So reducing every query to the database query add Redis cache, the cache found in Redis data, create concurrency-safe container ConcurrentHashMap store data in a method, to avoid duplication of search operation, only check once a method call until the end.

  Map<String, Object> map = new ConcurrentHashMap();
        Object obj = map.get("key");
        if (null == obj){
            //查询缓存，或者数据库
            String value = "数据";
            map.put("key", value);
        }

Objects created inside the method, when the method call completes, push the stack, release your reference, it will free up memory. In the process of verification in 3000, Object object is in jvm memory, easy to be quickly reused, rather than the need to obtain from the database or cache again. This is the way the stack level cache, JVM cache, the local cache.

This is the most important ideas, thinking. To achieve a method, as little as possible of the query, the query results reuse.

When I'm done ConcurrentHashMap in the methods of data cache, they were tested.

Results: Supports up to 800 data import check. Distal requested more than 10 seconds, the request will timeout.

How to do it? ? ?

Product, you can not handle the demand ah. Can not be achieved ah. . . . . . Wrangling in. . . . . Wrangling invalid.

Then use multithreading optimization techniques.

1. Create a thread pool

import java.util.concurrent.ExecutorService;
import java.util.concurrent.LinkedBlockingDeque;
import java.util.concurrent.ThreadPoolExecutor;
import java.util.concurrent.TimeUnit;

/**
 * 线程池<br/>
 * 
 * 
 * @author 
 * @version 
 */
public class MyExecutor {

   /**
    * 在池中保持的线程的最小数量
    */
   private static final int CorePoolSize = 10;
   /**
    * 线程池中能容纳的最大线程数量，如果超出，则使用RejectedExecutionHandler拒绝策略处理
    */
   private static final int MaximumPoolSize = 200;
   /**
    * 线程的最大生命周期。这里的生命周期有两个约束条件：
    * 一：该参数针对的是超过corePoolSize数量的线程；
    * 二：处于非运行状态的线程。举个例子：如果corePoolSize（最小线程数）为10，maxinumPoolSize（最大线程数）为20，
    * 而此时线程池中有15个线程在运行，过了一段时间后，其中有3个线程处于等待状态的时间超过keepAliveTime指定的时间，
    * 则结束这3个线程，此时线程池中则还有12个线程正在运行。
    */
   private static final int KeepAliveTime = 30;
   /**
    * 等待任务队列大小
    */
   private static final int Capacity = 10000;
   
   private static final ExecutorService pool = new ThreadPoolExecutor(CorePoolSize, MaximumPoolSize, KeepAliveTime, TimeUnit.SECONDS, new LinkedBlockingDeque<>(Capacity));
   


   public static ExecutorService getPool(){
      return pool;
   }
   
}

2. Create a thread pool task for receiving the return value of an ordered set, in order to facilitate obtaining results.

 List<Future<Object>> futureList = new LinkedList<>();

3. Get the thread pool

       //获取线程池
        ExecutorService pool = MyExecutor.getPool();

4. Read Excel spreadsheet data, through each row, each row of data submitted to a multi-threaded task.

        //提交Callable任务到线程池
        Future<Object> future  = pool.submit(new Callable<Object>() {
            @Override
            public Object call() throws Exception {
                // 每条数据的计算
                return null;
            }
        });

         //把单个结果加入有序集合中。
        futureList.add(future);

The traversal futureList acquisition result.

 for (Future<Object> oneFuture : futureList) {
            try {
                //每一个任务的结果，阻塞方法，一直等待到计算任务完成。
                Object result = oneFuture.get();
            } catch (Exception e) {
                e.printStackTrace();
            }
        }

6. Thus, the combination of all the results returned. To complete the thread pool using this method of transformation.

7. At this time, a problem has emerged, 3000 data, each data has an id, how multiple threads, let the treated id not repeat, duplication can mark it? ? ?

This time I used a concurrent security Set ===> ConcurrentSkipListSet

     // 并发安全，去重复
        ConcurrentSkipListSet<Integer> idSet = new ConcurrentSkipListSet<>();

      boolean flag =idSet.add(id);
        if (!flag){
            //添加失败，说明数据重复。
        }

Let's look at the source ConcurrentSkipListSet the add () method:

    /**
     * Adds the specified element to this set if it is not already present.
     * More formally, adds the specified element {@code e} to this set if
     * the set contains no element {@code e2} such that {@code e.equals(e2)}.
     * If this set already contains the element, the call leaves the set
     * unchanged and returns {@code false}.
     *
     * @param e element to be added to this set
     * @return {@code true} if this set did not already contain the
     *         specified element
     * @throws ClassCastException if {@code e} cannot be compared
     *         with the elements currently in this set
     * @throws NullPointerException if the specified element is null
     */
    public boolean add(E e) {
        return m.putIfAbsent(e, Boolean.TRUE) == null;
    }

The above description with Google translate:
if the specified element does not already exist, add it to this collection.
      More formally, the specified element {@code e} is added to this set if
     the set does not contain elements {@code e2} to {@code e.equals (e2)}.
      If the set already contains the element, the call will exit the set
     unchanged and returns {@code false}

Description id here we go heavy usage is entirely correct.

Let us look at the source code Future of get () method:

    /**
     * Waits if necessary for the computation to complete, and then
     * retrieves its result.
     *
     * @return the computed result
     * @throws CancellationException if the computation was cancelled
     * @throws ExecutionException if the computation threw an
     * exception
     * @throws InterruptedException if the current thread was interrupted
     * while waiting
     */
    V get() throws InterruptedException, ExecutionException;

Translation: Waits if necessary for the computation to complete, and then retrieves its result.
Wait for the completion of the required calculation, and then retrieve the result

So, Future's get () method is blocked waiting.

This, I'm done from the beginning of the data 300, 800 to 10 seconds in response to the data, optimized data 7 to 3000 second response.

To complete the task, but also improves performance.

By this time using the thread pool, Future, Callable security and concurrent container classes ConcurrentHashMap, ConcurrentSkipListSet technology,

Greatly improved my multithreaded, concurrent programming techniques. There are methods stack level data cache, JVM cache, which is a leap of thought.

Excel spreadsheet data to optimize multi-threaded interface to import parity

Guess you like