Thread blocking problem caused by FileUtils.copyURLToFile under multi-threaded task and Java virtual machine stac

Introduction: While Java multi-threaded development brings benefits to programs, there are more and more problems caused by multi-threaded programs, and finding and analyzing problems for rookie programs turned out to be a headache. Below I will do a detailed analysis of the problems encountered in the process of using multi-threaded development programs in the project and share solutions.

Project description:
At work, I want to write a program to crawl a large number of pictures on a certain website. Traverse all the crawling tasks from HBase, open the fixed-size thread pool Executors.newFixedThreadPool(100), submit threads, and what each thread does is to use FileUtils.copyURLToFile to download images from Url and save them locally. The detailed code is as follows:

Main thread:

public static void getAllRecord (String tableName,String prefix,String dir) {
		 HTable table = null;
        try{
        	table = new HTable(conf, tableName);
             Scan s = new Scan();  

             s.setFilter(new PrefixFilter(prefix.getBytes()));
             ResultScanner ss = table.getScanner(s);
             ExecutorService executor = Executors.newFixedThreadPool(100);
             for(Result r:ss){
            	 try {
					TimeUnit.SECONDS.sleep(2);
				} catch (InterruptedException e) {
					// TODO Auto-generated catch block
					e.printStackTrace ();
				}
            	 Thread task = new Thread(new DownLoadPicTask(r,dir,tableName));
            	 executor.submit(task);

             }
             executor.shutdown();
        } catch (IOException e){  

        }finally{
          ...close the resource
        }
    }

Task thread:
public static String downloadFromUrl(String url,String dir,String cityName,String id) {  

        try {
            URL httpurl = new URL(url);
            String fileName = getFileNameFromUrl(url);
            System.out.println(fileName);
            File f = new File(dir + File.separator+ cityName+ File.separator+id+File.separator+ fileName);
            FileUtils.copyURLToFile(httpurl, f);

            FileInputStream fis = new FileInputStream(f);
            BufferedImage bufferedImg = ImageIO.read(fis);
            int imgWidth = bufferedImg.getWidth();
            int imgHeight = bufferedImg.getHeight();
            bufferedImg = null;
            fis.close();
            if(imgWidth<500&&imgHeight<500){
            	FileUtils.deleteQuietly(f);
            	return null;
            }
            return imgWidth + "," + imgHeight;
        } catch (Exception e) {
          return null;
        }

    }

Problem:

It has been executed for a long time. In theory, if the tasks are all executed, the thread pool will be destroyed and the main thread will end, but the result is no. The first thought is that there must be a deadlock somewhere. So open Java VisualVM to view.



It can be seen that pool-4 (that is, our own thread) is mostly in the waiting state.

Jstack calls out all stack information:
quote
"pool-4-thread-21" prio=6 tid=0x000000000d662800 nid=0x2364 waiting on condition [0x000000001175f000]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <0x0000000780faa230> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

   Locked ownable synchronizers:
- None

"pool-4-thread-20" prio=6 tid=0x000000000d662000 nid=0x32f8 waiting on condition [0x00000000114ef000]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <0x0000000780faa230> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

   Locked ownable synchronizers:
- None


There are about 100 thread stack information above, only two are listed here.

For the stack information analysis of the virtual machine, please refer to the article: "Three Examples Demonstrate Java Thread Dump Log Analysis" http://www.cnblogs.com/zhengyun_ustc/archive/2013/01/06/dumpanalysis.html

Preliminary analysis:
most The thread (main: "mostly", not all the reasons will be explained here, followed by my analysis ideas) is in the waiting state of WAITING. They are all waiting for such a resource as <0x0000000780faa230> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject).

Since threads are all waiting for a resource, what is this resource?

With doubts, we searched the entire stack information and found that only the pool-4-thread-** thread has this 0x0000000780faa230 thing in the same position. Then what is this thing?
quote
AbstractQueuedSynchronizer provides a basic framework for building locks or other related synchronization devices based on FIFO queues.
Condition is a class of conditional functions, which must be placed in the Lock code block, just like the wait and notify methods are placed in the synchronized block.
Compared with the object's (wait, notify), Condition's (await, signal) provides a more general and flexible solution, allowing threads of various conditions to communicate with each other.


For more details about AbstractQueuedSynchronizer, please refer to:
"Introduction and Principle Analysis of
AbstractQueuedSynchronizer" "AbstractQueuedSynchronizer"

said so much, referring to a large number of the above documents, still confused?

If the thread is in an idle state and cannot acquire a resource/condition (java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject), it must be occupied by a thread.

The problem of FileUtils.copyURLToFile is

based on this idea. I checked the status of all threads at the time, and finally found 3 running threads that I missed out of the 100 threads:
one of them is as follows:
quote
"pool-4-thread-15" prio=6 tid=0x000000000d65e000 nid=0x28e4 runnable [0x00000000109fe000]
   java.lang.Thread.State: RUNNABLE
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:152)
at java.net.SocketInputStream.read(SocketInputStream.java:122)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
- locked <0x00000007810fa6c0> (a java.io.BufferedInputStream)
at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687)
at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323)
- locked <0x00000007810fa770> (a sun.net.www.protocol.http.HttpURLConnection)
at java.net.URL.openStream(URL.java:1037)
at org.apache.commons.io.FileUtils.copyURLToFile(FileUtils.java:1460)
at com.esf.crawler.bootsStrap.DownLoadPicTask.downloadFromUrl(DownLoadPicTask.java:139)
at com.esf.crawler.bootsStrap.DownLoadPicTask.run(DownLoadPicTask.java:101)
at java.lang.Thread.run(Thread.java:745)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

   Locked ownable synchronizers:
- <0x00000007810e2060> (a java.util.concurrent.ThreadPoolExecutor$Worker)


That's right! ! ! This is the culprit, these 3 threads are in running state (RUNNABLE).

According to the stack information, we can find that the FileUtils.copyURLToFile method in the download operation is currently being executed, and the method is reading the socket stream without ending (at java.net.SocketInputStream.read(SocketInputStream.java:152))

The problem should be here !

So why doesn't the read end? If the network reading cannot be completed, it should timeout and exit. With this question I opened the way to download images from the web (see above).

FileUtils.copyURLToFile(httpurl, f);

Look at the api of FileUtils.copyURLToFile:

public static void copyURLToFile(URL source,File destination)throws IOException
public static void copyURLToFile(URL source,File destination,int connectionTimeout,int readTimeout)throws IOException


You can see that I am using the first method.
It has a warning:

quote
Warning : this method does not set a connection or read timeout and thus might block forever. Use copyURLToFile(URL, File, int, int) with reasonable timeouts to prevent this.


That is, if the connection timeout and read timeout are not set, it may be permanently blocked due to some exceptions.

Well, change it to a function with a timeout setting.

do it again! The result is still not what I expected >_<

HBase timeout

and restart after a period of time, all 100 threads still sleep. See the debug log and find the following exceptions:

quote
Exception in thread "main" java.lang.RuntimeException: org.apache.hadoop.hbase.client.ScannerTimeoutException: 200243ms passed since the last invocation, timeout is currently set to 60000
at org.apache.hadoop.hbase.client.AbstractClientScanner$1.hasNext(AbstractClientScanner.java:94)
at com.esf.crawler.bootsStrap.AjkPicDownload.getAllRecord(AjkPicDownload.java:32)
at com.esf.crawler.bootsStrap.AjkPicDownload.main(AjkPicDownload.java:75)
Caused by: org.apache.hadoop.hbase.client.ScannerTimeoutException: 200243ms passed since the last invocation, timeout is currently set to 60000
at org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:370)
at org.apache.hadoop.hbase.client.AbstractClientScanner$1.hasNext(AbstractClientScanner.java:91)
... 2 more
Caused by: org.apache.hadoop.hbase.UnknownScannerException: org.apache.hadoop.hbase.UnknownScannerException: Name: 1679, already closed?
at org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3053)
at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29497)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2012)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98)
at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:168)
at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:39)
at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:111)
at java.lang.Thread.run(Thread.java:745)

at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95)
at org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:285)
at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:204)
at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:59)
at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:114)
at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:90)
at org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:354)
... 3 more
Caused by: org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.UnknownScannerException): org.apache.hadoop.hbase.UnknownScannerException: Name: 1679, already closed?
at org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3053)
at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29497)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2012)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98)
at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:168)
at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:39)
at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:111)
at java.lang.Thread.run(Thread.java:745)

at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1453)
at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1657)
at org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1715)
at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.scan(ClientProtos.java:29900)
at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:174)
... 7 more


The above exception is the Scan timeout used by HBase, which exceeds the default value of 6000. Then an exception is thrown and the program terminates. That is, it does not traverse records from HBase's Scaner to generate tasks. Causes 100 threads to wait for nothing without tasks.
It turned out that I added thread sleep code to the above program, which caused the Scaner to time out. The sleep code was originally used to prevent the task from starting too many too early, resulting in this exception.

So far the general problem has been found.

Summary : From problems to positioning, to analysis and solutions. In the process, there will inevitably be wrong conjectures that we look at together. We still need to analyze and reason differently to verify the ideas as much as possible. Find clues one by one. The emergence of multi-threading problems is nothing more than deadlocks, resource competition, etc. caused by synchronization, concurrency, etc.

Detect and export the corresponding stack information through the tools provided in the JDK. It can analyze the reasons for the various states of the threads in the dump log, and find the corresponding solution to the problem.

The thread status in the dump is roughly as follows:

Deadlock, Deadlock (focus) is
executing, Runnable is  
waiting for resources, Waiting on condition (focus) is
waiting to acquire a monitor, Waiting on monitor entry (focus) is
paused, Suspended
object is waiting, Object .wait() or TIMED_WAITING
blocks, Blocked (focus) 
stops, and Parked
analyzes the reason and then locates the corresponding code. Change it! ! !

Reference: http://ju.outofmemory.cn/entry/95925

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326994142&siteId=291194637