Quartz scheduling source code analysis

foreword

The previous article, Quartz database table analysis , introduced the 11 tables provided by Quartz by default. This article will specifically analyze how Quartz schedules and how to use the database for distributed scheduling.

schedule thread

The scheduling class provided by Quartz is QuartzScheduler, and QuartzScheduler will entrust QuartzSchedulerThread to perform real-time scheduling; when the scheduling needs to execute the job, QuartzSchedulerThread does not directly execute the job,
but gives it to the ThreadPool to execute the job. Multithreading, which can be configured in the configuration file:

org.quartz.threadPool.class: org.quartz.simpl.SimpleThreadPool
org.quartz.threadPool.threadCount: 10
org.quartz.threadPool.threadPriority: 5

The commonly used thread pool is SimpleThreadPool, where 10 threads are started by default, 10 WorkerThreads will be created in SimpleThreadPool, and WorkerThreads will execute specific jobs;

Schedule Analysis

QuartzSchedulerThread is the core class of scheduling. For details on how Quartz implements scheduling, you can view the core source code of QuartzSchedulerThread:

public void run() {
    boolean lastAcquireFailed = false;
 
    while (!halted.get()) {
        try {
            // check if we're supposed to pause...
            synchronized (sigLock) {
                while (paused && !halted.get()) {
                    try {
                        // wait until togglePause(false) is called...
                        sigLock.wait(1000L);
                    } catch (InterruptedException ignore) {
                    }
                }
 
                if (halted.get()) {
                    break;
                }
            }
 
            int availThreadCount = qsRsrcs.getThreadPool().blockForAvailableThreads();
            if(availThreadCount > 0) { // will always be true, due to semantics of blockForAvailableThreads...
 
                List<OperableTrigger> triggers = null;
 
                long now = System.currentTimeMillis();
 
                clearSignaledSchedulingChange();
                try {
                    triggers = qsRsrcs.getJobStore().acquireNextTriggers(
                            now + idleWaitTime, Math.min(availThreadCount, qsRsrcs.getMaxBatchSize()), qsRsrcs.getBatchTimeWindow());
                    lastAcquireFailed = false;
                    if (log.isDebugEnabled()) 
                        log.debug("batch acquisition of " + (triggers == null ? 0 : triggers.size()) + " triggers");
                } catch (JobPersistenceException jpe) {
                    if(!lastAcquireFailed) {
                        qs.notifySchedulerListenersError(
                            "An error occurred while scanning for the next triggers to fire.",
                            jpe);
                    }
                    lastAcquireFailed = true;
                    continue;
                } catch (RuntimeException e) {
                    if(!lastAcquireFailed) {
                        getLog().error("quartzSchedulerThreadLoop: RuntimeException "
                                +e.getMessage(), e);
                    }
                    lastAcquireFailed = true;
                    continue;
                }
 
                if (triggers != null && !triggers.isEmpty()) {
 
                    now = System.currentTimeMillis();
                    long triggerTime = triggers.get(0).getNextFireTime().getTime();
                    long timeUntilTrigger = triggerTime - now;
                    while(timeUntilTrigger > 2) {
                        synchronized (sigLock) {
                            if (halted.get()) {
                                break;
                            }
                            if (!isCandidateNewTimeEarlierWithinReason(triggerTime, false)) {
                                try {
                                    // we could have blocked a long while
                                    // on 'synchronize', so we must recompute
                                    now = System.currentTimeMillis();
                                    timeUntilTrigger = triggerTime - now;
                                    if(timeUntilTrigger >= 1)
                                        sigLock.wait(timeUntilTrigger);
                                } catch (InterruptedException ignore) {
                                }
                            }
                        }
                        if(releaseIfScheduleChangedSignificantly(triggers, triggerTime)) {
                            break;
                        }
                        now = System.currentTimeMillis();
                        timeUntilTrigger = triggerTime - now;
                    }
 
                    // this happens if releaseIfScheduleChangedSignificantly decided to release triggers
                    if(triggers.isEmpty())
                        continue;
 
                    // set triggers to 'executing'
                    List<TriggerFiredResult> bndles = new ArrayList<TriggerFiredResult>();
 
                    boolean goAhead = true;
                    synchronized(sigLock) {
                        goAhead = !halted.get();
                    }
                    if(goAhead) {
                        try {
                            List<TriggerFiredResult> res = qsRsrcs.getJobStore().triggersFired(triggers);
                            if(res != null)
                                bndles = res;
                        } catch (SchedulerException se) {
                            qs.notifySchedulerListenersError(
                                    "An error occurred while firing triggers '"
                                            + triggers + "'", se);
                            //QTZ-179 : a problem occurred interacting with the triggers from the db
                            //we release them and loop again
                            for (int i = 0; i < triggers.size(); i++) {
                                qsRsrcs.getJobStore().releaseAcquiredTrigger(triggers.get(i));
                            }
                            continue;
                        }
 
                    }
 
                    for (int i = 0; i < bndles.size(); i++) {
                        TriggerFiredResult result =  bndles.get(i);
                        TriggerFiredBundle bndle =  result.getTriggerFiredBundle();
                        Exception exception = result.getException();
 
                        if (exception instanceof RuntimeException) {
                            getLog().error("RuntimeException while firing trigger " + triggers.get(i), exception);
                            qsRsrcs.getJobStore().releaseAcquiredTrigger(triggers.get(i));
                            continue;
                        }
 
                        // it's possible to get 'null' if the triggers was paused,
                        // blocked, or other similar occurrences that prevent it being
                        // fired at this time...  or if the scheduler was shutdown (halted)
                        if (bndle == null) {
                            qsRsrcs.getJobStore().releaseAcquiredTrigger(triggers.get(i));
                            continue;
                        }
 
                        JobRunShell shell = null;
                        try {
                            shell = qsRsrcs.getJobRunShellFactory().createJobRunShell(bndle);
                            shell.initialize(qs);
                        } catch (SchedulerException se) {
                            qsRsrcs.getJobStore().triggeredJobComplete(triggers.get(i), bndle.getJobDetail(), CompletedExecutionInstruction.SET_ALL_JOB_TRIGGERS_ERROR);
                            continue;
                        }
 
                        if (qsRsrcs.getThreadPool().runInThread(shell) == false) {
                            // this case should never happen, as it is indicative of the
                            // scheduler being shutdown or a bug in the thread pool or
                            // a thread pool being used concurrently - which the docs
                            // say not to do...
                            getLog().error("ThreadPool.runInThread() return false!");
                            qsRsrcs.getJobStore().triggeredJobComplete(triggers.get(i), bndle.getJobDetail(), CompletedExecutionInstruction.SET_ALL_JOB_TRIGGERS_ERROR);
                        }
 
                    }
 
                    continue; // while (!halted)
                }
            } else { // if(availThreadCount > 0)
                // should never happen, if threadPool.blockForAvailableThreads() follows contract
                continue; // while (!halted)
            }
 
            long now = System.currentTimeMillis();
            long waitTime = now + getRandomizedIdleWaitTime();
            long timeUntilContinue = waitTime - now;
            synchronized(sigLock) {
                try {
                  if(!halted.get()) {
                    // QTZ-336 A job might have been completed in the mean time and we might have
                    // missed the scheduled changed signal by not waiting for the notify() yet
                    // Check that before waiting for too long in case this very job needs to be
                    // scheduled very soon
                    if (!isScheduleChanged()) {
                      sigLock.wait(timeUntilContinue);
                    }
                  }
                } catch (InterruptedException ignore) {
                }
            }
 
        } catch(RuntimeException re) {
            getLog().error("Runtime error occurred in main trigger firing loop.", re);
        }
    } // while (!halted)
 
    // drop references to scheduler stuff to aid garbage collection...
    qs = null;
    qsRsrcs = null;
}

1.halted和paused

These are the flag parameters of two boolean values, respectively: stop and pause; halted is false by default, and will be updated to true when QuartzScheduler executes shutdown(); paused is true by default, and is
updated to false when QuartzScheduler executes start() ; QuartzSchedulerThread can be executed after normal startup;

2.availThreadCount

Query whether SimpleThreadPool has available WorkerThread, if availThreadCount>0, you can continue to execute other logic, otherwise continue to check;

3.acquireNextTriggers

Query the triggers that will be scheduled for a period of time. There are three important parameters here: idleWaitTime, maxBatchSize, batchTimeWindow. These three parameters can be configured in the configuration file:

org.quartz.scheduler.idleWaitTime:30000
org.quartz.scheduler.batchTriggerAcquisitionMaxCount:1
org.quartz.scheduler.batchTriggerAcquisitionFireAheadTimeWindow:0

idleWaitTime: The amount of time (in milliseconds) that the scheduler will wait before re-querying for available triggers when the scheduler is idle, default is 30 seconds;
batchTriggerAcquisitionMaxCount: allows the scheduler node to acquire (for triggering) at a time maximum number of triggers, default is 1;
batchTriggerAcquisitionFireAheadTimeWindow: the amount of time (milliseconds) that a trigger is allowed to be acquired and fired before its scheduled fire time, default is 0;

Continue to view the source code of the acquireNextTriggers method:

public List<OperableTrigger> acquireNextTriggers(final long noLaterThan, final int maxCount, final long timeWindow)
    throws JobPersistenceException {
     
    String lockName;
    if(isAcquireTriggersWithinLock() || maxCount > 1) { 
        lockName = LOCK_TRIGGER_ACCESS;
    } else {
        lockName = null;
    }
    return executeInNonManagedTXLock(lockName, 
            new TransactionCallback<List<OperableTrigger>>() {
                public List<OperableTrigger> execute(Connection conn) throws JobPersistenceException {
                    return acquireNextTrigger(conn, noLaterThan, maxCount, timeWindow);
                }
            },
            ......
            });
}

It can be found that the LOCK_TRIGGER_ACCESS lock is only used when acquireTriggersWithinLock or batchTriggerAcquisitionMaxCount>1 is set, that is to say, in the case of the default parameter configuration, no lock is used here,
so if multiple nodes execute acquireNextTriggers at the same time, will it appear? The same trigger is executed on multiple nodes?
Note: acquireTriggersWithinLock can be configured in the configuration file:

org.quartz.jobStore.acquireTriggersWithinLock=true

acquireTriggersWithinLock: Whether to use locks when acquiring triggers, the default is false, if batchTriggerAcquisitionMaxCount>1, it is best to set acquireTriggersWithinLock to true at the same time;

Continue to view the source code of the acquireNextTrigger method inside TransactionCallback with the question:

protected List<OperableTrigger> acquireNextTrigger(Connection conn, long noLaterThan, int maxCount, long timeWindow)
    throws JobPersistenceException {
    if (timeWindow < 0) {
      throw new IllegalArgumentException();
    }
     
    List<OperableTrigger> acquiredTriggers = new ArrayList<OperableTrigger>();
    Set<JobKey> acquiredJobKeysForNoConcurrentExec = new HashSet<JobKey>();
    final int MAX_DO_LOOP_RETRY = 3;
    int currentLoopCount = 0;
    do {
        currentLoopCount ++;
        try {
            List<TriggerKey> keys = getDelegate().selectTriggerToAcquire(conn, noLaterThan + timeWindow, getMisfireTime(), maxCount);
             
            // No trigger is ready to fire yet.
            if (keys == null || keys.size() == 0)
                return acquiredTriggers;
 
            long batchEnd = noLaterThan;
 
            for(TriggerKey triggerKey: keys) {
                // If our trigger is no longer available, try a new one.
                OperableTrigger nextTrigger = retrieveTrigger(conn, triggerKey);
                if(nextTrigger == null) {
                    continue; // next trigger
                }
                 
                // If trigger's job is set as @DisallowConcurrentExecution, and it has already been added to result, then
                // put it back into the timeTriggers set and continue to search for next trigger.
                JobKey jobKey = nextTrigger.getJobKey();
                JobDetail job;
                try {
                    job = retrieveJob(conn, jobKey);
                } catch (JobPersistenceException jpe) {
                    try {
                        getLog().error("Error retrieving job, setting trigger state to ERROR.", jpe);
                        getDelegate().updateTriggerState(conn, triggerKey, STATE_ERROR);
                    } catch (SQLException sqle) {
                        getLog().error("Unable to set trigger state to ERROR.", sqle);
                    }
                    continue;
                }
                 
                if (job.isConcurrentExectionDisallowed()) {
                    if (acquiredJobKeysForNoConcurrentExec.contains(jobKey)) {
                        continue; // next trigger
                    } else {
                        acquiredJobKeysForNoConcurrentExec.add(jobKey);
                    }
                }
                 
                if (nextTrigger.getNextFireTime().getTime() > batchEnd) {
                  break;
                }
                // We now have a acquired trigger, let's add to return list.
                // If our trigger was no longer in the expected state, try a new one.
                int rowsUpdated = getDelegate().updateTriggerStateFromOtherState(conn, triggerKey, STATE_ACQUIRED, STATE_WAITING);
                if (rowsUpdated <= 0) {
                    continue; // next trigger
                }
                nextTrigger.setFireInstanceId(getFiredTriggerRecordId());
                getDelegate().insertFiredTrigger(conn, nextTrigger, STATE_ACQUIRED, null);
 
                if(acquiredTriggers.isEmpty()) {
                    batchEnd = Math.max(nextTrigger.getNextFireTime().getTime(), System.currentTimeMillis()) + timeWindow;
                }
                acquiredTriggers.add(nextTrigger);
            }
 
            // if we didn't end up with any trigger to fire from that first
            // batch, try again for another batch. We allow with a max retry count.
            if(acquiredTriggers.size() == 0 && currentLoopCount < MAX_DO_LOOP_RETRY) {
                continue;
            }
             
            // We are done with the while loop.
            break;
        } catch (Exception e) {
            throw new JobPersistenceException(
                      "Couldn't acquire next trigger: " + e.getMessage(), e);
        }
    } while (true);
     
    // Return the acquired trigger list
    return acquiredTriggers;
}

First look at the introduction of a new parameter when the selectTriggerToAcquire method is executed: misfireTime=current time-MisfireThreshold, MisfireThreshold can be configured in the configuration file:

org.quartz.jobStore.misfireThreshold: 60000

misfireThreshold: called trigger timeout, for example, there are 10 threads, but there are 11 tasks, so one task is delayed to execute, which can be understood as the scheduling engine can endure this timeout time; the specific query SQL is as follows:

SELECT TRIGGER_NAME, TRIGGER_GROUP, NEXT_FIRE_TIME, PRIORITY
  FROM qrtz_TRIGGERS
 WHERE SCHED_NAME = 'myScheduler'
   AND TRIGGER_STATE = 'WAITING'
   AND NEXT_FIRE_TIME <= noLaterThan
   AND (MISFIRE_INSTR = -1 OR
       (MISFIRE_INSTR != -1 AND NEXT_FIRE_TIME >= noEarlierThan))
 ORDER BY NEXT_FIRE_TIME ASC, PRIORITY DESC

Here noLaterThan=current time+idleWaitTime+batchTriggerAcquisitionFireAheadTimeWindow,
noEarlierThan=current time-MisfireThreshold;
after the query is completed, the updateTriggerStateFromOtherState() method will be traversed to update the state of the trigger from STATE_WAITING to STATE_ACQUIRED, and it will be judged whether rowsUpdated is greater than 0. All nodes query the same trigger, but only one node must be updated successfully; after updating the status, insert a record into the qrtz_fired_triggers table, indicating that the current trigger has been triggered and the status is STATE_ACQUIRED;

4.executeInNonManagedTXLock

Quartz's distributed locks are used in many places. Let's take a look at how Quartz implements distributed locks. The source code of the executeInNonManagedTXLock method is as follows:

protected <T> T executeInNonManagedTXLock(
        String lockName, 
        TransactionCallback<T> txCallback, final TransactionValidator<T> txValidator) throws JobPersistenceException {
    boolean transOwner = false;
    Connection conn = null;
    try {
        if (lockName != null) {
            // If we aren't using db locks, then delay getting DB connection 
            // until after acquiring the lock since it isn't needed.
            if (getLockHandler().requiresConnection()) {
                conn = getNonManagedTXConnection();
            }
             
            transOwner = getLockHandler().obtainLock(conn, lockName);
        }
         
        if (conn == null) {
            conn = getNonManagedTXConnection();
        }
         
        final T result = txCallback.execute(conn);
        try {
            commitConnection(conn);
        } catch (JobPersistenceException e) {
            rollbackConnection(conn);
            if (txValidator == null || !retryExecuteInNonManagedTXLock(lockName, new TransactionCallback<Boolean>() {
                @Override
                public Boolean execute(Connection conn) throws JobPersistenceException {
                    return txValidator.validate(conn, result);
                }
            })) {
                throw e;
            }
        }
 
        Long sigTime = clearAndGetSignalSchedulingChangeOnTxCompletion();
        if(sigTime != null && sigTime >= 0) {
            signalSchedulingChangeImmediately(sigTime);
        }
         
        return result;
    } catch (JobPersistenceException e) {
        rollbackConnection(conn);
        throw e;
    } catch (RuntimeException e) {
        rollbackConnection(conn);
        throw new JobPersistenceException("Unexpected runtime exception: "
                + e.getMessage(), e);
    } finally {
        try {
            releaseLock(lockName, transOwner);
        } finally {
            cleanupConnection(conn);
        }
    }
}

Roughly divided into 3 steps: acquire lock, execute logic, release lock; getLockHandler().obtainLock means acquire lock txCallback.execute(conn) means execution logic, commitConnection(conn) means release lock
Quartz's distributed lock interface class is Semaphore, The default specific implementation is StdRowLockSemaphore, the specific interface is as follows:

public interface Semaphore {
    boolean obtainLock(Connection conn, String lockName) throws LockException;
    void releaseLock(String lockName) throws LockException;
    boolean requiresConnection();
}

Let's take a look at how obtainLock() acquires the lock. The source code is as follows:

public boolean obtainLock(Connection conn, String lockName)
    throws LockException {
    if (!isLockOwner(lockName)) {
        executeSQL(conn, lockName, expandedSQL, expandedInsertSQL);
        getThreadLocks().add(lockName);
        
    } else if(log.isDebugEnabled()) {
        
    }
    return true;
}
 
protected void executeSQL(Connection conn, final String lockName, final String expandedSQL, final String expandedInsertSQL) throws LockException {
    PreparedStatement ps = null;
    ResultSet rs = null;
    SQLException initCause = null;
     
    int count = 0;
    do {
        count++;
        try {
            ps = conn.prepareStatement(expandedSQL);
            ps.setString(1, lockName);
             
            rs = ps.executeQuery();
            if (!rs.next()) {
                getLog().debug(
                        "Inserting new lock row for lock: '" + lockName + "' being obtained by thread: " + 
                        Thread.currentThread().getName());
                rs.close();
                rs = null;
                ps.close();
                ps = null;
                ps = conn.prepareStatement(expandedInsertSQL);
                ps.setString(1, lockName);
 
                int res = ps.executeUpdate();
                 
                if(res != 1) {
                   if(count < 3) {
                        try {
                            Thread.sleep(1000L);
                        } catch (InterruptedException ignore) {
                            Thread.currentThread().interrupt();
                        }
                        continue;
                    }
                }
            }
             
            return; // obtained lock, go
        } catch (SQLException sqle) {
            ......
    } while(count < 4);
 
}

obtainLock first determines whether the lock has been acquired. If the method executeSQL is not executed, there are two important SQLs: expandedSQL and expandedInsertSQL. Take SCHED_NAME = 'myScheduler' as an example:

SELECT * FROM QRTZ_LOCKS WHERE SCHED_NAME = 'myScheduler' AND LOCK_NAME = ? FOR UPDATE
INSERT INTO QRTZ_LOCKS(SCHED_NAME, LOCK_NAME) VALUES ('myScheduler', ?)

FOR UPDATE is added after the select statement. If LOCK_NAME exists, when multiple nodes execute this SQL, only the first node will succeed, and other nodes will enter the wait;
if LOCK_NAME does not exist, multiple nodes will execute expandedInsertSQL at the same time, Only one node will be successfully inserted, and the node that fails to execute the insertion will retry and re-execute expandedSQL;
after the txCallback is executed, execute the commitConnection operation, so that the current node releases LOCK_NAME, other nodes can compete to acquire the lock, and finally execute the releaseLock ;

5.triggersFired

Indicates triggering trigger, the specific code is as follows:

protected TriggerFiredBundle triggerFired(Connection conn,
        OperableTrigger trigger)
    throws JobPersistenceException {
    JobDetail job;
    Calendar cal = null;
 
    // Make sure trigger wasn't deleted, paused, or completed...
    try { // if trigger was deleted, state will be STATE_DELETED
        String state = getDelegate().selectTriggerState(conn,
                trigger.getKey());
        if (!state.equals(STATE_ACQUIRED)) {
            return null;
        }
    } catch (SQLException e) {
        throw new JobPersistenceException("Couldn't select trigger state: "
                + e.getMessage(), e);
    }
 
    try {
        job = retrieveJob(conn, trigger.getJobKey());
        if (job == null) { return null; }
    } catch (JobPersistenceException jpe) {
        try {
            getLog().error("Error retrieving job, setting trigger state to ERROR.", jpe);
            getDelegate().updateTriggerState(conn, trigger.getKey(),
                    STATE_ERROR);
        } catch (SQLException sqle) {
            getLog().error("Unable to set trigger state to ERROR.", sqle);
        }
        throw jpe;
    }
 
    if (trigger.getCalendarName() != null) {
        cal = retrieveCalendar(conn, trigger.getCalendarName());
        if (cal == null) { return null; }
    }
 
    try {
        getDelegate().updateFiredTrigger(conn, trigger, STATE_EXECUTING, job);
    } catch (SQLException e) {
        throw new JobPersistenceException("Couldn't insert fired trigger: "
                + e.getMessage(), e);
    }
 
    Date prevFireTime = trigger.getPreviousFireTime();
 
    // call triggered - to update the trigger's next-fire-time state...
    trigger.triggered(cal);
 
    String state = STATE_WAITING;
    boolean force = true;
     
    if (job.isConcurrentExectionDisallowed()) {
        state = STATE_BLOCKED;
        force = false;
        try {
            getDelegate().updateTriggerStatesForJobFromOtherState(conn, job.getKey(),
                    STATE_BLOCKED, STATE_WAITING);
            getDelegate().updateTriggerStatesForJobFromOtherState(conn, job.getKey(),
                    STATE_BLOCKED, STATE_ACQUIRED);
            getDelegate().updateTriggerStatesForJobFromOtherState(conn, job.getKey(),
                    STATE_PAUSED_BLOCKED, STATE_PAUSED);
        } catch (SQLException e) {
            throw new JobPersistenceException(
                    "Couldn't update states of blocked triggers: "
                            + e.getMessage(), e);
        }
    } 
         
    if (trigger.getNextFireTime() == null) {
        state = STATE_COMPLETE;
        force = true;
    }
 
    storeTrigger(conn, trigger, job, true, state, force, false);
 
    job.getJobDataMap().clearDirtyFlag();
 
    return new TriggerFiredBundle(job, trigger, cal, trigger.getKey().getGroup()
            .equals(Scheduler.DEFAULT_RECOVERY_GROUP), new Date(), trigger
            .getPreviousFireTime(), prevFireTime, trigger.getNextFireTime());
}

First, check whether the status of the trigger is STATE_ACQUIRED, if not, return null directly; then obtain the corresponding jobDetail through jobKey, and update the corresponding FiredTrigger to EXECUTING status; finally, determine whether the job's DisallowConcurrentExecution is enabled. If it is enabled, the job cannot be executed concurrently, then the trigger The status is STATE_BLOCKED, otherwise it is STATE_WAITING; if the status is STATE_BLOCKED, then
the trigger corresponding to the next scheduling will not be pulled, only after the corresponding job is executed, the update status is STATE_WAITING. serial;

6. Execute the job

Execute the JobRunShell that encapsulates the job through ThreadPool;

problem explanation

In the article Spring integrates Quartz distributed scheduling , I finally did several tests of distributed scheduling, and now I can make corresponding explanations

1. The same trigger will only be executed on one node at the same time

It can be found above that Quartz uses distributed locks and state to ensure that only one node can execute;

2. If the task is not completed, you can start again

Because the scheduling thread and the task execution thread are separate, it is considered that the execution is executed in the Threadpool and does not affect each other;

3. Ensure the serialization of tasks through the DisallowConcurrentExecution annotation

If DisallowConcurrentExecution is used in triggerFired, the STATE_BLOCKED state will be introduced to ensure the serialization of tasks;

Summarize

This article briefly introduces the Quartz scheduling process from the perspective of source code. Of course, the details are not in-depth; through this article, a reasonable explanation can be given to the phenomenon of multi-node scheduling.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324896876&siteId=291194637