Distributed system: Distributed task scheduling platform xxl-job in-depth understanding

xxl-job is a distributed timing task scheduling framework with powerful functions. The bottom layer uses the rpc framework implemented by itself for registration and management, the database uses mysql, and the scheduling trigger uses the database lock as the scheduling lock.

xxl-job is mainly divided into the scheduling center admin and tasks. After the task introduces the dependent jar package and configures the startup class as a bean managed by spring, it will automatically start the thread through the initMethod provided by spring-bean, select a port for registration and monitor task scheduling .

The company currently introduces the xxl-job framework instead of the quartz framework as a distributed task scheduling component, and performs certain development and optimization on it, so this article mainly shares some in-depth usage, mainly a detailed introduction of the concept.

Due to the limited space, the details are not written here. Friends who need the full version and more knowledge points can click the link below to get it for free

Link: 1103806531 Password: CSDN

Insert picture description here

Introduction to the key concepts of the system

Actuator

The executor configured by the configuration center corresponds conceptually to the service that executes timing tasks, and supports distributed scheduling and various routing rule configurations for scheduling. The registration method supports automatic registration and manual configuration of the machine address. The default heartbeat interval is 30s and the expiration time is 90s.

After the executor is automatically registered, the scheduling center page still has a delay of up to 30 seconds. The reason is that after the registry in the database is updated, the table showing the executor is updated by another daemon thread. The update frequency is the default heartbeat time of 30s. , So the management console display will be delayed, but it does not affect task scheduling.

task

Tasks are configured in the dimension of executors. Each task must belong to an executor. When the task is triggered, it will find the address list of the executor according to the executor to which the task belongs, and then execute it through the configured routing rules and blocking rules.

Tasks support local tasks and remote tasks, and local tasks are executed according to the business logic written by the executor. The remote task passes GLUE, writes the code in the management station of the dispatch center, and distributes it to the executing party for execution. It is recommended to use local tasks uniformly if there is no special requirement.

Task configuration item description

Insert picture description here

  1. Actuator: select which executor to execute the task
  2. Task description: briefly describe the function and role of the task, such as: order running batch
  3. Routing strategy: How to select the executor when setting the task execution. For high-frequency tasks, it is recommended to use consistent hashing or the first execution
  4. Cron: Cron expression, describing when the task runs
  5. Operation mode: BEAN is to configure the access service to run in the corresponding handler locally, other methods are the management station setting code and handing over to the access service to execute remotely
  6. JobHandler: Required when running mode is BEAN, and the value should be the handler that executes tasks locally on the access service
  7. Blocking strategy: when the same task is scheduled to the same executor multiple times, the executor should use the strategy
  8. Subtask ID: If configured, the execution of the subtask is automatically triggered once the task is completed
  9. Task timeout time: After configuration, the task execution will be automatically terminated when the task times out.
  10. Number of failed retries: The number of retries after the task fails.
  11. Person in charge: generally the person in charge of the access party of the task
  12. Alarm email: the email address sent after the task alarms
  13. Task parameters: If task parameters are configured, the task parameters will be sent to the executor handler during task scheduling.

Blocking strategy

The blocking strategy is the blocking execution strategy of the same task in the executor. Controlled by the actuator. The typical scenario is: task A is distributed to executor A for execution, and task A is triggered again and distributed to executor A. At this time, there will be the following three execution strategies according to the blocking strategy selection:

  1. Under the single-machine serial strategy, when the same executor receives the scheduling trigger of the same task, if there is already a task being executed, it will put the subsequent task into the queue of the execution thread, and wait for the thread to poll to continue execution, which may cause the thread Too many tasks in the queue cause high memory, and high-frequency and time-consuming tasks should be used with caution.
  2. Discard subsequent scheduling Under this strategy, when the same executor receives the scheduling trigger of the same task, if an existing task is executing, it will directly discard the subsequent scheduling of the same task. It is recommended.
  3. Under this strategy of overriding the previous scheduling, when the same executor receives the scheduling trigger of the same task, if an existing task is executing, it will directly stop the task being executed (judged by the thread InterruptedException and volatile variable), and put the new task Into the queue. It is not recommended under normal circumstances.

Routing strategy

The routing strategy is the strategy of selecting the executor when the task is scheduled and distributed in the configuration center. Controlled by the configuration center. The typical scenario is: task A triggers execution, and the executors corresponding to task A include executors A, B, C, and D. At this time, according to the routing strategy selection, there will be the following distribution situations

  1. The first one: Always select the first executor as the task executor, regardless of whether the task executor is normal or not.
  2. Last one: Always choose the last one as the task executor
  3. Polling: each executor executes in turn
  4. Random: Randomly select an actuator to execute
  5. Consistent HASH: The executor is selected by consistent hashing based on the task ID, and the same task must be distributed to the same executor. Recommended for high-frequency or time-consuming tasks
  6. Least used: Select the actuator with the lowest average frequency of use.
  7. The least recently used: select the most recent and least used actuator.
  8. Failover: Perform heartbeat detection separately, and select the first machine with normal heartbeat detection to execute.
  9. Busy transfer: Perform busy detection separately and select the first idle machine to execute.
  10. Fragmentation broadcast: broadcast to all executors for execution and provide fragmentation parameters. The method for obtaining fragmentation parameters is as follows. The application dynamically acquires which fragment it is when triggered, and there are several fragments:

Log issue

xxl-job related logs use slf4j as the log framework by default. When using a dedicated API to write logs, two types of logs will be output, client-side logs and server-side logs

Client log

The client log is specified according to the logpath configured in the configuration file. According to the source code analysis, the client log will be written to the corresponding file through FileOutputStream and cannot be modified by configuration, so the logic in the source code has to be modified and the value is changed to be empty and not configured When, write directly through slf4j.

Server log

When using the log api of xxljob to output the log, the log will also be seen in the scheduling management station. The principle that can be seen is that the xxl-job management station will call the executor interface through rpc, and the executor will receive the request from the specified log file Read the executed log and return it. There is a troublesome problem here, that is, the log logic of xxl-job cannot be well compatible with the unified log module of the project, which is very inconvenient.

So in actual use, when we query the log in the xxl-job management console, we modified it to not query from rpc, but to use our log management search engine to query related logs according to the executed jobid, combined with the client Transformation of log output to unify log management between xxl-job and our system.

The current shortcomings and problems found in the framework

  1. At present, the scheduling serial of timed tasks relies on the db lock. A certain sub-microservice or subsystem is used internally, but it is not suitable for sharing at the entire company level. This is relatively highly coupled with the database, and there is no local cache. The HA dependence on the database is very high.
  2. The management module and authority module can be used in groups and small companies, but it is not suitable as a middleware shared by multiple systems and multiple departments.
  3. There are some simple security bugs in the management console. SQL injection and js script injection are very simple. (Measured by the company's safety test)
  4. Relevant protocol support is poor. Use your own RPC protocol. If you need dubbo or spring cloud, you need to expand it yourself. And although the bottom layer is netty, the encapsulation of netty exceptions is not very good, leading to some strange network problems or other protocols, and inexplicable errors will be reported. People who do not have a certain understanding of netty will not understand this. I mentioned Issue, but due to the decoupling and independent design of this framework, it is estimated that dubbo and spring cloud will not be supported.
  5. As mentioned above, the log module recommends that you modify the writing logic as a whole. It doesn't matter if the management station can't see it. (After all, no one will go to the management station to watch the log for scheduled tasks, and this part of the log writes too much for performance and network overhead. Influential), write to the local use of elk and check again.
  6. The serial number or tracking id triggered by the timing task needs to be changed to the original framework, otherwise it will also affect the follow-up log tracking problem.

Generally speaking, this is a very good framework, and the relationship between the executor and scheduler of timing tasks is also very elegant, and it is worthy of extension or some custom development. Currently, there is no problem with stability.

At last

I have compiled various knowledge modules and documents (microservices, databases, mysql, jvm, Redis, etc.) and more real interview questions from major companies. Friends in need can click the link below to get them for free

Link: 1103806531 Password: CSDN

Insert picture description here
Insert picture description here

Guess you like

Origin blog.csdn.net/weixin_48655626/article/details/109393147