ScheduledExecutorService tasks are running later than expected

klonq :

I'm running tasks periodically and to provide flexibility for the intervals, the next timeout is calculated at the end of each task, converted to milliseconds from Instant.now(), and scheduled using ScheduledExecutorService#schedule.

This code is generally working fine (blue curve on the left), but other days not so good.

enter image description here

It appears to me that things sometimes go bad at startup (machines are restarted every night) and though that the program should, and does, correct itself ScheduledExecutorService#schedule does not recover and scheduled tasks run late all the time. It appears that a full JVM restart is the only solution.

My initial thoughts was that this was a bug and depending of the time of the machine startup, things could go wrong. But the following log output demonstrates that the problem is related to my usage of ScheduledExecutorService#schedule :

// Log time in GMT+2, other times are in GMT
// The following lines are written following system startup (all times are correct)
08 juin 00:08:49.993 [main] WARN  com.pgscada.webdyn.Webdyn - Scheduling next webdyn service time. Currently 2018-06-07T22:08:49.993Z, last connection null
08 juin 00:08:50.586 [main] INFO  com.pgscada.webdyn.Webdyn - The next data sample at 2018-06-07T22:10:00Z and the next FTP connection at 2018-06-07T22:30:00Z
08 juin 00:08:50.586 [main] WARN  com.pgscada.webdyn.Webdyn - Completed webdyn schedule in 9ms, next execution at 2018-06-07T22:10:00Z (in 69414 ms) will run as data-sample
// So we are expecting the next execution to occur at 00:10:00 (or in 69.4 seconds)
// Except that it runs at 00:11:21
08 juin 00:11:21.206 [pool-1-thread-4] INFO  com.pgscada.webdyn.Webdyn - Executing Webdyn service, isDataSample=true, isFtpConnection=false, nextTimeout=2018-06-07T22:10:00Z, lastFtpConnection=null
// But thats OK because it should correct itself
08 juin 00:13:04.151 [pool-1-thread-4] WARN  com.pgscada.webdyn.Webdyn - Scheduling next webdyn service time. Currently 2018-06-07T22:10:00Z, last connection null
08 juin 00:13:04.167 [pool-1-thread-4] INFO  com.pgscada.webdyn.Webdyn - The next data sample at 2018-06-07T22:20:00Z and the next FTP connection at 2018-06-07T22:30:00Z
08 juin 00:13:04.167 [pool-1-thread-4] WARN  com.pgscada.webdyn.Webdyn - Completed webdyn schedule in 0ms, next execution at 2018-06-07T22:20:00Z (in 415833 ms) will run as data-sample
// So now we are expecting the next execution to occur at 00:20:00 (or in 415.8 seconds)
// But it runs at 00:28:06
08 juin 00:28:06.145 [pool-1-thread-4] INFO  com.pgscada.webdyn.Webdyn - Executing Webdyn service, isDataSample=true, isFtpConnection=false, nextTimeout=2018-06-07T22:20:00Z, lastFtpConnection=null

Below is actual production code of the scheduling function.

ScheduledExecutorService EXECUTORS = Executors.newScheduledThreadPool(10);


private void scheduleNextTimeout(Instant currentTime, Instant lastFtpConnection) {

    try {

        log.info("Scheduling next webdyn service time. Currently {}, last connection {}", currentTime, lastFtpConnection);

        // Parse config files first
        getConfigIni().parse();

        long time = System.nanoTime();
        final Instant earliestPossibleTimeout = Instant.now().plusSeconds(5); 

        Instant nextDataSample = nextTimeout(currentTime);

        if (nextDataSample.isBefore(earliestPossibleTimeout)) {
            final Instant oldTime = nextDataSample;
            nextDataSample = nextTimeout(earliestPossibleTimeout);
            log.warn("Next data sample was calculated to a time in the past '{}', resetting to a future time: {}", oldTime, nextDataSample);
        }

        Instant nextFtp = nextFtpConnection(currentTime, lastFtpConnection);

        if (nextFtp.isBefore(earliestPossibleTimeout)) {
            final Instant oldTime = nextFtp;
            nextFtp = nextFtpConnection(earliestPossibleTimeout, lastFtpConnection);
            log.warn("Next FTP connection was calculated to a time in the past '{}', resetting to a future time: {}", oldTime, nextFtp);
        }

        final boolean isFtpConnection = !nextDataSample.isBefore(nextFtp);
        final boolean isDataSample = !isFtpConnection || nextDataSample.equals(nextFtp);
        log.info("The next data sample at {} and the next FTP connection at {}", nextDataSample, nextFtp);

        final Instant nextTimeout = nextDataSample.isBefore(nextFtp) ? nextDataSample : nextFtp;
        final long millis = Duration.between(Instant.now(), nextTimeout).toMillis();
        EXECUTORS.schedule(() -> {
            log.info("Executing Webdyn service, isDataSample={}, isFtpConnection={}, nextTimeout={}, lastFtpConnection={}",
                    isDataSample, isFtpConnection, nextTimeout, lastFtpConnection);

            long tme = System.nanoTime();
            try {
                connect(isDataSample, isFtpConnection, nextTimeout, lastFtpConnection);
                log.warn("Completed webdyn service in {}s", (System.nanoTime() - tme) / 1000000);
            } catch (final Throwable ex) {
                log.error("Failed webdyn service after {}ms : {}", (System.nanoTime() - tme) / 1000000, ex.getMessage(), ex);
            } finally {
                scheduleNextTimeout(nextTimeout, isFtpConnection ? nextTimeout : lastFtpConnection);
            }
        }, millis, TimeUnit.MILLISECONDS);

        log.warn("Completed webdyn schedule in {}ms, next execution at {} (in {} ms) will run as {}",
                (System.nanoTime() - time) / 1000000, nextTimeout, millis, isFtpConnection ? "ftp-connection" : "data-sample");

    } catch (final Throwable ex) {
        log.error("Fatal error in webdyn schedule : {}", ex.getMessage(), ex);
    }
}
Igand :

As stated in my comment bellow the question, the issue here is that there is a shared, mutable and not-thread-safe resource (EXECUTORS atribute) altered by more than one thread. It is altered by main thread at start and whichever thread is used from the pool for task execution.

The thing to note is, that even when you have only one thread accessing a shared resource at a time (simply because there is only one task running at a time), you still need to think about concurency. It is because without synchronization Java Memory Model does not guarantee the changes made by one thread to be ever visible to other threads, no matter how much later they run.

So the solution would be making method scheduleNextTimeout synchronized, thus guaranteeing changes not to be kept local to executing thread and written to main memory.

You also could make a synchronized block (synchronized on "this") around the part, which makes access to shared resource, but since the system doesn't seem to be heavy duty one and the rest of code doesn't seem to take a long time, there is no need for that...

Here is very gist of it in nice and short article I once learned from, when confronted with this sort of issues for the first time :) https://www.cs.umd.edu/~pugh/java/memoryModel/jsr-133-faq.html#jsr133

I'm glad I could help.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=36431&siteId=1