几种http请求的实现方式

需要在程序中访问一批url,类似爬虫的东西,想了几种方案:

1. 同步执行
for循环一条条抓取,这种方式最简单但效率最差,遇到网站响应慢的url会阻塞掉后面的执行。


2.异步方式
每个url开一个进程来处理:

String[] urls = { "url1", "url2", "url3", "url4", "url5", "url6",
				"url7" };
for (String url : urls) {
 
Thread t = new Thread(new Runnable(){
   
    public void run(){
        //fetch site
     }

})
}
t.start()


这种方式能利用多线程同时并发http请求,最大的提高吞吐量。但这种方案也有问题:
1. 多少个图片就开多少个线程,线程数不可控,如果是一万张图片就启动一万个thread,明显资源有问题。
2. 大量启动线程也有性能消耗。

3. 使用线程池
通过配置线程池来做到资源可控。
import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.ArrayBlockingQueue;
import java.util.concurrent.BlockingQueue;

public class ThreadPool {

	private List<Thread> threads = new ArrayList<Thread>();

	private BlockingQueue<Runnable> queue = new ArrayBlockingQueue<Runnable>(20);

	public ThreadPool(int size) {
		for (int i = 0; i < size; ++i) {
			Thread thread = new Thread(new Worker(queue));
//			thread.setDaemon(true);
			thread.start();
			threads.add(thread);
		}
	}

	public void sumbit(Runnable runnable) {
		queue.add(runnable);
	}

	
	private static class Worker implements Runnable {

		private BlockingQueue<Runnable> queue;

		public Worker(BlockingQueue<Runnable> queue) {
			super();
			this.queue = queue;
		}

		@Override
		public void run() {
			while (true) {

				Runnable runnable = queue.poll();
				if(runnable!=null){
					runnable.run();
				}
				try {
					Thread.sleep(1000);
				} catch (InterruptedException e) {
					e.printStackTrace();
				}
			}
		}

	}
}

public class ThreadPoolFetcher {

	public static void main(String[] args) {
		ThreadPool pool = new ThreadPool(7);
		String[] urls = { "url1", "url2", "url3", "url4", "url5", "url6",
				"url7" };

		for (String url : urls) {
			pool.sumbit(new Fetcher(url));
		}
		

	}

	private static class Fetcher implements Runnable {
		private String url;

		public Fetcher(String url) {
			super();
			this.url = url;
		}

		public void run() {

			System.out.println(Thread.currentThread().getName() + ":" + url);
			try {
				Thread.sleep(1000);
			} catch (InterruptedException e) {
				e.printStackTrace();
			}
		}
	}
}

这种方式利用生产者消费者的方式来实现线程池,做到资源可控。
不过这种方式有点问题是,执行完成之后线程池里的线程不会退出。

4. 使用Executors轻松搞定:
最后还是使用jdk5提供的Executors轻松搞定吧:
import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.Callable;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;


public class Fetcher {

	public static void main(String[] args) throws InterruptedException {
		
		String[] urls = {"url1","url2","url3","url4","url5","url6","url7"};
		ExecutorService exs =  Executors.newFixedThreadPool(100);
		List<Callable<String>> tasks = new ArrayList<Callable<String>>();
		for(String url :urls){
			tasks.add(new FetchImageTask(url));
		}
		exs.invokeAll(tasks);
		System.out.println("end");
		exs.shutdown();
	}
	
	
	/**
	 * @author yunpeng
	 *
	 */
	private static  class FetchImageTask implements Callable<String>{
		
		private String url;
		
		public FetchImageTask(String url) {
			super();
			this.url = url;
		}
		@Override
		public String call() throws Exception {
			System.out.println(Thread.currentThread().getName());
			 Thread.sleep(3000);
			return "ok"+url;
		}
	}
}



猜你喜欢

转载自san-yun.iteye.com/blog/1651500