How to read lines from a CSV to use in multiple threads

slaw300 :

Suppose I have a CSV file with hundreds of lines with two random keywords as cells I'd like to Google search and have the first result on the page printed to the console or stored in some array. In the case of this example, I imagine I would successfully do this reading one line at a time using something like the following:

CSVReader reader = new CSVReader(new FileReader(FILE_PATH));
String [] nextLine;
while ((nextLine = reader.readNext())) !=null) {
driver.get("http://google.com/");
driver.findElement(By.name("q").click();
driver.findElement(By.name("q").clear();
driver.findElement(By.name("q").sendKeys(nextLine[0] + " " + nextLine[1]);
System.out.println(driver.findElement(By.xpath(XPATH_TO_1ST));
}

How would I go about having 5 or however many threads of chromedriver through selenium process the CSV file as fast as possible? I've been able to get 5 lines done at a time implementing Runnable on a class that does this and starting 5 threads, but I would like to know if there is a solution where as soon as one thread is complete, it processes the next available or unprocessed line, as opposed to waiting for the 5 searches to process, then going on to the next 5 lines. Would appreciate any suggested reading or tips on cracking this!

tom :

If you want to do 5 (or more) threads at the same time, you would need to start 5 instances of WebDriver as it is not thread safe. As for updating the CSV, you would need to synchronize writes to that for each thread to prevent corruption to the file itself, or you could batch up updates at some threshold and write several lines at once.

See this Can Selenium use multi threading in one browser?

Update:

How about this? It ensures the web driver is not re-used between threads.

CSVReader reader = new CSVReader(new FileReader(FILE_PATH));

// number to do at same time
int concurrencyCount = 5;
ExecutorService executorService = Executors.newFixedThreadPool(concurrencyCount);
CompletionService<Boolean> completionService = new ExecutorCompletionService<Boolean>(executorService);
String[] nextLine;

// ensure we use a distinct WebDriver instance per thread
final LinkedBlockingQueue<WebDriver> webDrivers = new LinkedBlockingQueue<WebDriver>();
for (int i=0; i<concurrencyCount; i++) {
    webDrivers.offer(new ChromeDriver());
}
int count = 0;
while ((nextLine = reader.readNext()) != null) {
    final String [] line = nextLine;
    completionService.submit(new Callable<Boolean>() {
        public Boolean call() {
            try {
                // take a webdriver from the queue to use
                final WebDriver driver = webDrivers.take();
                driver.get("http://google.com/");
                driver.findElement(By.name("q")).click();
                driver.findElement(By.name("q")).clear();
                driver.findElement(By.name("q")).sendKeys(line[0] + " " + line[1]);
                System.out.println(line[1]);
                line[2] = driver.findElement(By.xpath(XPATH_TO_1ST)).getText();

                // put webdriver back on the queue
                webDrivers.offer(driver);
                return true;
            } catch (InterruptedException e) {
                e.printStackTrace();
                return false;
            }
        }
    });
    count++;
}

boolean errors = false;
while(count-- > 0) {
    Future<Boolean> resultFuture = completionService.take();
    try {
        Boolean result = resultFuture.get();
    } catch(Exception e) {
        e.printStackTrace();
        errors = true;
    }
}
System.out.println("done, errors=" + errors);
for (WebDriver webDriver : webDrivers) {
    webDriver.close();
}
executorService.shutdown();

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=96720&siteId=1