Java爬虫技术之Executor多线程学习笔记

一、简介

Executor多线程框架是内置的，不需外加第三方jar包

为了提高爬虫的爬取效率，我们要使用多线程

相比传统的Thread类，Java Executor使用方便，性能更好，更易于管理，而且支持线程池。

二、常用接口：

创建固定数目线程的线程池。

public static ExecutorService newFixedThreadPool(int nThreads)

执行一个线程

void java.util.concurrent.Executor.execute(Runnable command)

查看活动线程个数

int java.util.concurrent.ThreadPoolExecutor.getActiveCount()

结束所有线程

void java.util.concurrent.ExecutorService.shutdown()

三、设置10个线程要同时爬取100个网页

import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.ThreadPoolExecutor;

public class ExecutorTest {

    private static Integer pages=1; // 网页数
    
    private static boolean exeFlag=true; // 执行标识
    
    public static void main(String[] args) {
        ExecutorService executorService=Executors.newFixedThreadPool(10); // 创建ExecutorService 连接池默认连接10个
        
        
        while(exeFlag){
            if(pages<=100){
                executorService.execute(new Runnable() {
                    
                    @Override
                    public void run() {
                        // TODO Auto-generated method stub
                        System.out.println("爬取了第"+pages+"网页...");
                        pages++;
                    }
                });
            }else{
                if(((ThreadPoolExecutor)executorService).getActiveCount()==0){ // 活动线程个数是0
                    executorService.shutdown(); // 结束所有线程
                    exeFlag=false;
                    System.out.println("爬虫任务已经完成");
                }
            }
            
            try {
                Thread.sleep(100); // 线程休息0.1秒
            } catch (InterruptedException e) {
                // TODO Auto-generated catch block
                e.printStackTrace();
            } 
        }
        
    }
}

Java爬虫技术之Executor多线程学习笔记

猜你喜欢