Pit caused by multi-threaded consumption using HttpClient

      Recently, there was a problem with the company's short message platform. All clients can be called normally, but no short message was sent to the user, and then the user complained in a large area that they could not receive the short message, and the business could not continue.

The implementation of the SMS platform framework simply draws a sequence diagram, and the general ideas are in it. as follows:


 

       The problem of not receiving text messages has occurred once before. The reason is that the service of a short message channel provider is not stable. At that time, after investigating the cause, it was found that there was a large amount of data in the cache queue. It was guessed that the consumption thread was blocked. The confirmation above is that the consumer thread is blocked. Later, I slowly checked the code and found that it was because the api of the SMS service provider did not set a timeout time when requesting httpclient. When the consuming thread used the service of the channel provider, because the api did not set the timeout time, the HttpClient was blocked, resulting in the long-term consumption of the thread. Blocking follow-up messages cannot be consumed. The user experience is that you can click normally, but you cannot receive text messages, and the background log does not reflect the details. The online solution is to restart. There will be frequent call timeouts, and if the timeout time is not set, there will be frequent blocking in the short term. This blocking problem proves that the conjecture is correct after privately setting up the service call verification afterwards. HttpClient will block for a long time without setting the timeout time. After troubleshooting this problem at the time, I summed up a few solutions:

a. Rewrite the API interface of SMS channel providers

b. Modify the framework to use non-blocking consumer threads for processing, and start a thread for each message

c. Add a timeout mechanism to your own service provider

Among them, rewriting the API interface was too troublesome at the time, and there was no time to deal with it. It was even more troublesome to rewrite the framework. The solution found later was to make an asynchronous call when calling the API in the consumer thread, and add it to the jdk when using the corresponding channel provider. The Callable under the concurrent package performs asynchronous processing and solves the problem of no timeout. The specific code is as follows:

         ExecutorService service = Executors.newSingleThreadExecutor();
            Callable<Integer> callable  = new Callable<Integer>(){
                public Integer call() throws Exception{
                    ISms sms=new CHttpPost();
                     //API call, return the result of the call
                    return sms.send(...);
                }
            };
            Future<Integer> future = service.submit( callable);
            service.shutdown();
            //wait up to three minutes
            if(service.awaitTermination (180000l, TimeUnit.MILLISECONDS) == false)
                throw new TimeOutException("**SMS call timeout");
            result =future.get();//Get the processing result

       In awaitTermination, the API request is already being processed in an asynchronous thread. If the http request is normal, the result will be returned immediately to obtain the processing result. Otherwise, if there is a long-term blocking, it will wait for up to three minutes, and then throw an exception, and the consumer thread will process the next message.

When this problem occurred at the time, it was not paid enough attention. Now, after it appeared again, it was found that there was no timeout period set in the API of more than one channel supplier, and at that time, only the timeout processing was added to the call of the channel supplier with the problem. Do other channel providers have timeout settings? At that time, they thought that other people's services were low. Now that there is a problem again, I have to say that my company's framework is also flawed. If I can take into account the possible long-term blocking of consumer threads earlier If you prevent it in advance, you will not have problems with other people's services that will affect the operation of your own services.

       This is a pit that has been stepped on by individuals. It is shared for friends to learn from. I hope that everyone can share any problems and make progress together.
 

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326372856&siteId=291194637