Parallel met when DI - Spring parallel data aggregation Best Practices

Analysis Taobao PDP

Let us look at a map, Taobao's PDP (Product Detail Page) page.

Open the Chrome Network panel, let's look at how the taobao load the page data. According to experience, it is generally loaded asynchronously, either XHR, or is js (jsonp), you should be able to find quickly

You can still see this interface performance

Amazingly, taobao actually pull down the entire PDP full data pages in a single request, and the server process takes less than 125ms

First of all, what good do?

  • Front and rear side development simple docking
  • Transmission of data in a single network connection as much as possible (data size should not affect the user experience, generally not more than 300kb), reducing the number of connection establishment and request headers waste flow.

Then, this is how to do it?

You might say cache, but you know, such an electricity supplier is extremely important pages, absolutely involves a lot of teams, such as:

  • Product Team
  • Sellers Team
  • The evaluation team
  • Team Orders
  • Team Member
  • Special rates for group
  • Q & A team
  • Recommended team
  • Logistics system
  • etc / etc

Even though each team are all cached data, you'll get one, not easy to get finished within 125ms. And as with money-related pages, some data must guarantee effective real-time, not many places can cache . how to do, if you, how would you do? offline marking? data preheat? etc ..

At this point, it is a good way parallel calls.

Analyze this page, you will find that each module in addition to belong to the same commodity (to the Senate the same), in fact, between the data of each module, and not dependent entirely in parallel to acquire.

Parallel there is no problem?

Parallel access to data, the interface can improve our performance but also introduce a number of issues, such as:

  • The items may depend a lot, how to make the code simple and clear?
  • Dependency is likely to be a directed graph, if you do have to figure each node can execute in parallel?
  • After asynchronous processing, how to deal with overtime? Business code thrown how to deal with?
  • If there is an infinite loop dependence how to do?
  • After induction, the contents of ThreadLocal in how to deal with? How to do some of the work ThreadLocal-based Context does not achieve?
  • Thread isolation transaction is how to do?
  • How to monitor every asynchronous execution, the performance of each node?

Below, we discuss how the simple \ use \ efficient parallel data acquisition; how to solve the asynchronous problem.

Common parallel

If you now need basic information about the user \ blog list \ fan list 3 copies of data which Mody what way you can get it in parallel?

Parallel Java ThreadPool

The most rudimentary way, the direct use of Java provides a thread pool and Future mechanisms.

public User getUserDataByParallel(Long userId) throws InterruptedException, ExecutionException {
    ExecutorService executorService = Executors.newFixedThreadPool(3);
    CountDownLatch countDownLatch = new CountDownLatch(3);
    Future<User> userFuture = executorService.submit(() -> {
        try{
            return userService.get(userId);
        }finally {
            countDownLatch.countDown();
        }
    });
    Future<List<Post>> postsFuture = executorService.submit(() -> {
        try{
            return postService.getPosts(userId);
        }finally {
            countDownLatch.countDown();
        }
    });
    Future<List<User>> followersFuture = executorService.submit(() -> {
        try{
            return followService.getFollowers(userId);
        }finally {
            countDownLatch.countDown();
        }
    });
    countDownLatch.await();
    User user = userFuture.get();
    user.setFollowers(followersFuture.get());
    user.setPosts(postsFuture.get());
    return user;
}
复制代码

Spring's asynchronous parallel

We know, Spring support @Async notes, you can easily implement asynchronous and supports capture the return value of the reference:. Www.baeldung.com/spring-asyn...

The actual realization of the principle of @Async Bean is in the process of proxy class, the intercept method calls, submit Callable tasks to taskExecutor Bean in Principle with their written Java ThreadPool in fact very different.

Spring Async then use to achieve the above functions. First need to modify the return value of the following three methods, and modify the return type, and add annotations to methods @Async

class UserServiceImpl implements UserService {
    @Async
    public Future<User> get(Long userId) {
        // ... something
    }
}
class PostServiceImpl implements PostService {
    @Async
    public Future<List<Post> getPosts(Long userId) {
        // ... something
    }
}
class FollowServiceImpl implements FollowService {
    @Async
    public Future<List<User> getFollowers(Long userId) {
        // ... something
    }
}
复制代码

3 parts parallel acquisition of the user data is then polymerized, as follows:

public User getUserDataByParallel(Long userId) throws InterruptedException, ExecutionException {
    Future<User> userFuture = userService.get(userId);
    Future<List<Post>> postsFuture = postService.getPosts(userId);
    Future<List<User>> followersFuture = followService.getFollowers(userId);
    
    User user = whileGet(userFuture);
    user.setFollowers(whileGet(followersFuture));
    user.setPosts(whileGet(postsFuture));
    return user;
}

private <T> T whileGet(Future<T> future) throws ExecutionException, InterruptedException {
    while(true) {
        if (future.isDone()) {
            break;
        }
    }
    return future.get();
}
复制代码

Here to get asynchronous data using spin. Of course, you can also like before, passing a lockout (CountDownLatch) Service to go, then let the calling thread to wait in a lockout above.

Parallel binding DI (DI)

The above two ways were able to achieve functional, but first, they are not intuitive, and does not address the problem of asynchronous mentioned earlier, once the timeout occurs \ abnormal \ ThreadLocal, the code may not work as you expect. That there is no more simple and convenient and reliable way to do that?

Imagine such a way that, if the data you need, you can get automatic parallelization method to the Senate, then passed to you, it is not very convenient like this?:

@Component
public class UserAggregate {
    @DataProvider("userWithPosts")
    public User userWithPosts(
            @DataConsumer("user") User user,
            @DataConsumer("posts") List<Post> posts,
            @DataConsumer("followers") List<User> followers) {
        user.setPosts(posts);
        user.setFollowers(followers);
        return user;
    }
}
复制代码

Here's @DataConsumerthe statement you want to get asynchronous data id. @DataProviderDeclares this method provides data, and id is userWithPosts.

Or you do not want to write such a Aggregate class, you do not need to reuse, you want to directly create an "anonymous Provider". Then you can directly call like this to get results anywhere

User user = dataBeanAggregateQueryFacade.get(
     Collections.singletonMap("userId", 1L), 
     new Function3<User, List<Post>,List<User>, User>() {
            @Override
            public User apply(@DataConsumer("user") User user, 
                              @DataConsumer("posts") List<Post> posts,
                              @DataConsumer("followers") List<User> followers) {
                user.setPosts(posts);
                user.setFollowers(followers);
                return user;
            }
     });
Assert.notNull(user,"user not null");
Assert.notNull(user.getPosts(),"user posts not null");
复制代码

Here Function3 receiving four generic parameter, User represents the last return type, the first three parameters correspond to successively apply three types of the parameters of the method. Predefined project Function2-Function5, supports no more than five parameters, if you For more parameters, you can write an interface (FunctionInterface), the interface can be inherited MultipleArgumentsFunction.

obviously

  • Each @DataConsumerwill only corresponds to one @DataProvider.
  • A @DataProvidermay be more @DataConsumerconsumption.
  • A @DataProviderby a plurality of @DataConsumera plurality of dependency on @DataProvider.

Now, there is such a project, to achieve the above functions. Just on your way to add some comments. You can quickly make your call tree into parallel.

Project Address: github.com/lvyahui8/sp...

You do not care how the underlying implementation. Only when you have customized needs only to take care of some configuration parameters to extend some capabilities.

The principle

  1. Spring at the start, the scan application @DataProviderand @DataConsumerannotations. Depends were recorded (non-directed connected graph), and good recording @DataProvidermapping relationship between the Spring Bean.
  2. As the post-injection when the reference query, out of the well has been recorded dependencies dependency tree, and locking thread pool (CountLatchDown), recursively calling the child node corresponding to an asynchronous Bean method, get the result of the current node (approximately breadth priority, but because of parallel, sequential access node is uncertain).
  3. Before initiating a recursive call, passing into a map, used to store the query parameters, the method is not @DataConsumerannotated to the Senate, he will take values from this map.
  4. @DataProviderAnd @DataConsumerannotations can support some of the parameters used to control the timeout \ exception processing mode \ whether idempotent cache and so on.

How to solve new problems after the introduction of parallel / asynchronous

Timeout how to control?

@DataProviderAnnotation support timeoutparameters used to control the timeout. The principle is by blocking timeout waiting method.

java.util.concurrent.CountDownLatch#await(long, java.util.concurrent.TimeUnit)
复制代码

Abnormal how to deal with?

Abnormal provides two possible ways: swallowed or thrown to the upper layer.

@DataConsumerAnnotation support exceptionProcessingMethodparameter is used to indicate how to deal with the Consumer wants Provider thrown.

Of course, also supports arranged in a global dimension. Global priority lower than (<) Consumer priority configuration.

Dependence infinite loop how to do?

Spring Bean initialization, because Bean and Bean property assignment points to create a two-step, so you can use so-called "early reference to" resolve the circular dependency issue.

But if you cycle dependent Bean, dependence on the definition of the parameters in the constructor, then there is no way to solve the problem of circular dependencies.

Similarly, we enter through the method parameters, data asynchronously injecting dependencies, into the reference method in the case of no change, and it is not the end of the cycle of death and must therefore be prohibited circular dependency.

The question then becomes how the cycle of dependence or prohibited, how to detect dependent on non-connected graph in circulation, two ways:

  • DFS with stained traverse: before the stack access node, the first node status is marked "visit", then descend a child node, the recursion is completed, the node labeled "complete access" ** If the DFS recursive process. access to once again "visit" node, indicating a ring. **
  • Topological Sort: lined to the nodes of the graph have a sequence, the higher the index number of the node node pointing to a lower index number does not exist, it represents a topological sort FIG topological sorting method is implemented, to delete the 0 degree. node, and generals to the node of the degree - 1, until all the nodes are deleted obviously, if there is a directed graph in the ring, then the ring in the degree nodes, not 0, then the node can not delete completed therefore. , as long as the node is not deleted completely absent && into a node of degree 0, then there must be a ring.

Here we take a lead table + DFS staining search to achieve check ring

private void checkCycle(Map<String,Set<String>> graphAdjMap) {
    Map<String,Integer> visitStatusMap = new HashMap<>(graphAdjMap.size() * 2);
    for (Map.Entry<String, Set<String>> item : graphAdjMap.entrySet()) {
        if (visitStatusMap.containsKey(item.getKey())) {
            continue;
        }
        dfs(graphAdjMap,visitStatusMap,item.getKey());
    }
}

private void dfs(Map<String,Set<String>> graphAdjMap,Map<String,Integer> visitStatusMap, String node) {
    if (visitStatusMap.containsKey(node)) {
        if(visitStatusMap.get(node) == 1) {
            List<String> relatedNodes = new ArrayList<>();
            for (Map.Entry<String,Integer> item : visitStatusMap.entrySet()) {
                if (item.getValue() == 1) {
                    relatedNodes.add(item.getKey());
                }
            }
            throw new IllegalStateException("There are loops in the dependency graph. Related nodes:" + StringUtils.join(relatedNodes));
        }
        return ;
    }
    visitStatusMap.put(node,1);
    log.info("visited:{}", node);
    for (String relateNode : graphAdjMap.get(node)) {
        dfs(graphAdjMap,visitStatusMap,relateNode);
    }
    visitStatusMap.put(node,2);
}
复制代码

ThreadLocal how to deal with?

Many of the frameworks used ThreadLocal to implement Context to save some shared data in a single request, Spring is no exception.

As we all know, ThreadLocal is actually a special entry access Thread the Map. ThreadLocal current Thread can only access data (copy), if crossed threads, can not get to the data of the other ThreadLocalMap.

Solution

Figure

  1. Before submitting the current thread asynchronous tasks, the data for the current thread of execution ThreadLocal "tied" to the task instance
  2. ThreadLocal when the task has started, the data extracted from the task instance, the current returns to the asynchronous thread's
  3. After the end of the task, cleaning ThreadLocal current asynchronous thread.

Here, we first define an interface to describe these three actions

public interface AsyncQueryTaskWrapper {
    /**
     * 任务提交之前执行. 此方法在提交任务的那个线程中执行
     */
    void beforeSubmit();

    /**
     * 任务开始执行前执行. 此方法在异步线程中执行
     * @param taskFrom 提交任务的那个线程
     */
    void beforeExecute(Thread taskFrom);

    /**
     * 任务执行结束后执行. 此方法在异步线程中执行
     * 注意, 不管用户的方法抛出何种异常, 此方法都会执行.
     * @param taskFrom 提交任务的那个线程
     */
    void afterExecute(Thread taskFrom);
}
复制代码

In order to make three works we define the action. We need to rewrite java.util.concurrent.Callable # call method.

public abstract class AsyncQueryTask<T> implements Callable<T> {
    Thread      taskFromThread;
    AsyncQueryTaskWrapper asyncQueryTaskWrapper;

    public AsyncQueryTask(Thread taskFromThread, AsyncQueryTaskWrapper asyncQueryTaskWrapper) {
        this.taskFromThread = taskFromThread;
        this.asyncQueryTaskWrapper = asyncQueryTaskWrapper;
    }

    @Override
    public T call() throws Exception {
        try {
            if(asyncQueryTaskWrapper != null) {
                asyncQueryTaskWrapper.beforeExecute(taskFromThread);
            }
            return execute();
        } finally {
            if (asyncQueryTaskWrapper != null) {
                asyncQueryTaskWrapper.afterExecute(taskFromThread);
            }
        }
    }

    /**
     * 提交任务时, 业务方实现这个替代方法
     *
     * @return
     * @throws Exception
     */
    public abstract T  execute() throws Exception;
}
复制代码

Then, when you submit the task to the thread pool, no longer submitted directly Callable anonymous class instances, but submitted AsyncQueryTask instance. And triggered before submitting taskWrapper.beforeSubmit();

AsyncQueryTaskWrapper taskWrapper = new CustomAsyncQueryTaskWrapper();
// 任务提交前执行动作.
taskWrapper.beforeSubmit();
Future<?> future = executorService.submit(new AsyncQueryTask<Object>(Thread.currentThread(),taskWrapper) {
    @Override
    public Object execute() throws Exception {
        try {
            // something to do
        } finally {
            stopDownLatch.countDown();
        }
    }
});
复制代码
what are you going to do?

You only need to define a class that implements this interface, and adds the class configuration file.

@Slf4j
public class CustomAsyncQueryTaskWrapper implements AsyncQueryTaskWrapper {
    /**
     * "捆绑" 在任务实例中的数据
     */
    private Long tenantId;
    private User user;

    @Override
    public void beforeSubmit() {
        /* 提交任务前, 先从当前线程拷贝出ThreadLocal中的数据到任务中 */
        log.info("asyncTask beforeSubmit. threadName: {}",Thread.currentThread().getName());
        this.tenantId = RequestContext.getTenantId();
        this.user = ExampleAppContext.getUser();
    }

    @Override
    public void beforeExecute(Thread taskFrom) {
        /* 任务提交后, 执行前, 在异步线程中用数据恢复ThreadLocal(Context) */
        log.info("asyncTask beforeExecute. threadName: {}, taskFrom: {}",Thread.currentThread().getName(),taskFrom.getName());
        RequestContext.setTenantId(tenantId);
        ExampleAppContext.setLoggedUser(user);
    }

    @Override
    public void afterExecute(Thread taskFrom) {
        /* 任务执行完成后, 清理异步线程中的ThreadLocal(Context) */
        log.info("asyncTask afterExecute. threadName: {}, taskFrom: {}",Thread.currentThread().getName(),taskFrom.getName());
        RequestContext.removeTenantId();
        ExampleAppContext.remove();
    }
}
复制代码

Add configuration enables TaskWapper take effect.

io.github.lvyahui8.spring.task-wrapper-class=io.github.lvyahui8.spring.example.wrapper.CustomAsyncQueryTaskWrapper
复制代码

How to monitor an asynchronous call every time?

Solution

We put a query into the following life cycle

  • Query task initial submission (querySubmitted)
  • Before (queryBefore) a certain node started Provider
  • Provider one node execution is completed (queryAfter)
  • Query completed (queryFinished)
  • Query abnormal (exceptionHandle)

Converted into the following interfaces

public interface AggregateQueryInterceptor {
    /**
     * 查询正常提交, Context已经创建
     *
     * @param aggregationContext 查询上下文
     * @return 返回为true才继续执行
     */
    boolean querySubmitted(AggregationContext aggregationContext) ;

    /**
     * 每个Provider方法执行前, 将调用此方法. 存在并发调用
     *
     * @param aggregationContext 查询上下文
     * @param provideDefinition 将被执行的Provider
     */
    void queryBefore(AggregationContext aggregationContext, DataProvideDefinition provideDefinition);

    /**
     * 每个Provider方法执行成功之后, 调用此方法. 存在并发调用
     *
     * @param aggregationContext 查询上下文
     * @param provideDefinition 被执行的Provider
     * @param result 查询结果
     * @return 返回结果, 如不修改不, 请直接返回参数中的result
     */
    Object queryAfter(AggregationContext aggregationContext, DataProvideDefinition provideDefinition, Object result);

    /**
     * 每个Provider执行时, 如果抛出异常, 将调用此方法. 存在并发调用
     *
     * @param aggregationContext  查询上下文
     * @param provideDefinition 被执行的Provider
     * @param e Provider抛出的异常
     */
    void exceptionHandle(AggregationContext aggregationContext, DataProvideDefinition provideDefinition, Exception e);

    /**
     * 一次查询全部完成.
     *
     * @param aggregationContext 查询上下文
     */
    void queryFinished(AggregationContext aggregationContext);
}
复制代码

At the beginning of Spring applications to start, get all the implements Bean AggregateQueryInterceptor interface, and in accordance with the Order annotation sort, as the interceptor chain.

As for how the interceptor to perform. Very simple, when you submit a recursive query tasks, perform some insert hook (hook) function can be. A lot of code involved, not posted here, interested can go to github clone the code view.

what are you going to do?

You can implement an interceptor, interceptor log output, the execution status monitoring node (Processed, access parameters), as follows:

@Component
@Order(2)
@Slf4j
public class SampleAggregateQueryInterceptor implements AggregateQueryInterceptor {
    @Override
    public boolean querySubmitted(AggregationContext aggregationContext) {
        log.info("begin query. root:{}",aggregationContext.getRootProvideDefinition().getMethod().getName());
        return true;
    }

    @Override
    public void queryBefore(AggregationContext aggregationContext, DataProvideDefinition provideDefinition) {
        log.info("query before. provider:{}",provideDefinition.getMethod().getName());
    }

    @Override
    public Object queryAfter(AggregationContext aggregationContext, DataProvideDefinition provideDefinition, Object result) {
        log.info("query after. provider:{},result:{}",provideDefinition.getMethod().getName(),result.toString());
        return result;
    }

    @Override
    public void exceptionHandle(AggregationContext aggregationContext, DataProvideDefinition provideDefinition, Exception e) {
        log.error(e.getMessage());
    }

    @Override
    public void queryFinished(AggregationContext aggregationContext) {
        log.info("query finish. root: {}",aggregationContext.getRootProvideDefinition().getMethod().getName());
    }
}
复制代码

project address

Finally, once again posted about Project Address: github.com/lvyahui8/sp... .

Paizhuan welcome, welcome star, Welcome

Guess you like

Origin juejin.im/post/5e131d276fb9a047ec702d66