DolphinDB timing job tutorial

The scheduled job function provided by DolphinDB database allows the system to automatically execute jobs at a specified time and at a specified frequency. When we need the database to automatically execute some scripts for calculation and analysis (such as daily K-line calculation after the market is closed, monthly statistical report generation), database management (such as database backup, data synchronization), operating system management (such as expired Log file deletion) and other tasks, you can use this function to achieve.

Timing jobs are represented by a function, which gives great flexibility in job definition. Any work that can be represented by a function can be run as a timed task. Timed jobs are submitted through the scheduleJob function and run in the background according to the set time. After the job is created, the job-related definition information is serialized and saved to the disk file of the data node. After the node restarts, the system will deserialize and load the scheduled job. The results of each run of a scheduled job will also be saved on the node disk. We can use getJobMessage and getJobReturn to view the run log and return value of each job.

1. Function introduction

1.1 Create a scheduled job

Create a scheduled job using the function scheduleJob . After the job is created, the system will serialize the job definition information and save it to a file <homeDir>/sysmgmt/jobEditlog.meta. The function syntax is as follows:

scheduleJob(jobId, jobDesc, jobFunc, scheduledTime, startDate, endDate, frequency, [days])

The ones to note are:

  • The parameter jobFunc (job function) is a function without parameters.
  • The parameter scheduledTime (scheduled time) can be a scalar or vector of type minute. When it is a vector, note that the interval between two adjacent time points cannot be less than 30 minutes.
  • The return value of the function is the job ID of the scheduled job. If the entered jobId is not the same as the job ID of an existing scheduled job, the system returns the entered jobId. Otherwise, add the current date after jobId, "000", "001", etc. as suffixes, until a unique job ID is generated.

As we all know, to execute a function must provide all the parameters needed by the function. In functional programming, a function that provides all parameters is actually a special partial application (Partial Application) of the original function , that is, a function without parameters. In DolphinDB, we use curly braces {} to indicate some applications.

Various functions such as custom functions, built-in functions, plug-in functions, function views, and functions in modules can be used as job functions. Therefore, timed homework can do almost anything. For example, use custom functions and plug-in functions for calculation and analysis, use the built-in function run to run a script file, use shell functions to perform operating system management, and so on. The job in the following example calls a custom function getMaxTemperature, which is used to calculate the maximum value of a certain device temperature index in the previous day. The parameter is the device number. When creating a job, use getMaxTemperature{1} to assign a value of 1 to the device number. Executed at 0:00 every day.

def getMaxTemperature(deviceID){
    maxTemp=exec max(temperature) from loadTable("dfs://dolphindb","sensor")
            where ID=deviceID ,ts between (today()-1).datetime():(today().datetime()-1)
    return  maxTemp
}
scheduleJob(`testJob, "getMaxTemperature", getMaxTemperature{1}, 00:00m, today(), today()+30, 'D');

The following example executes a script file. The job function uses the run function and specifies the full path of the script file monthlyJob.dos as a parameter. The job is executed at 0:00 on the 1st of each month in 2020.

scheduleJob(`monthlyJob, "Monthly Job 1", run{"/home/DolphinDB/script/monthlyJob.dos"}, 00:00m, 2020.01.01, 2020.12.31, 'M', 1);

The following example executes an operating system command to delete log files. The job function uses the shell function and specifies the specific command "rm /home/DolphinDB/server/dolphindb.log" as the parameter. The homework is executed every Sunday at 1 o'clock.

scheduleJob(`weeklyjob, "rm log", shell{"rm /home/DolphinDB/server/dolphindb.log"}, 1:00m, 2020.01.01, 2021.12.31, 'W', 6);

In practical applications, it is a bit inconvenient to use function parameters and function return values ​​for input and output. Our more common practice is to fetch data from the database and store the results in the database after calculation. The following example is to calculate the minute-level K-line after the market is closed every day. In the custom function computeK, the market data is taken out from the distributed database table trades, and stored in the distributed database table OHLC after calculation. The frequency of the job is "W", the days is [1,2,3,4,5], and the scheduledTime is 15:00m, which means that the job will be executed at 15:00 every Monday to Friday.

def computeK(){
	barMinutes = 7
	sessionsStart=09:30:00.000 13:00:00.000
	OHLC =  select first(price) as open, max(price) as high, min(price) as low,last(price) as close, sum(volume) as volume 
		from loadTable("dfs://stock","trades")
		where time >today() and time < now()
		group by symbol, dailyAlignedBar(timestamp, sessionsStart, barMinutes*60*1000) as barStart
	append!(loadTable("dfs://stock","OHLC"),OHLC)
}
scheduleJob(`kJob, "7 Minutes", computeK, 15:00m, 2020.01.01, 2021.12.31, 'W', [1,2,3,4,5]);

1.2 Query timing jobs

GetScheduledJobs can be used to query the scheduled job definition information in the node . The function syntax is as follows:

getScheduledJobs([jobIdPattern])

The parameter jobIdPattern is a string representing the job ID or job ID pattern. It supports wildcard characters "%" and "?". The return value of the function is the timing job information in tabular form. If jobId is not specified, all jobs are returned.

The system will save the execution of each job, including the running log and return value of the scheduled job. The running log is saved in the jodId.msg file, and the return value of the scheduled job is saved in the jobId.object file. These files are stored in the directory <homeDir>/batchJobs. We can use getJobMessage and getJobReturn to view the running log and return value of each job. But pay attention to the value of jobID. First, when creating a job, as mentioned earlier, if the jobId is the same as the job ID of an existing scheduled job, the system will not return the entered jobId; second, for jobs that will be executed multiple times, every When the timed job is executed the next time, the job ID is different. Therefore, we need to use getRecentJobs to view completed timing jobs. For example, we define the following timing job:

def foo(){
	print "test scheduled job at"+ now()
	return now()
}
scheduleJob(`testJob, "foo", foo, 17:00m+0..2*30, today(), today(), 'D');

After running getRecentJobs(), get the following information:

jobId	            jobDesc	startTime	            endTime
------              ------- ----------------------- ----------------------
testJob	            foo1	2020.02.14T17:00:23.636	2020.02.14T17:00:23.639
testJob20200214	    foo1	2020.02.14T17:30:23.908	2020.02.14T17:30:23.910
testJob20200214000  foo1	2020.02.14T18:00:23.148	2020.02.14T18:00:26.749

From this we can see that the job ID executed for the first time is "testJob", and the second time is "testJob20200214"...every time it changes. As shown below, we can use getJobMessageand getJobReturnview the third implementation:

>getJobMessage(`testJob20200214000);
2020-02-14 18:00:23.148629 Start the job [testJob20200214000]: foo
2020-02-14 18:00:23.148721 test the scheduled job at 2020.02.14T18:00:23.148
2020-02-14 18:00:26.749111 The job is done.

>getJobReturn(`testJob20200214000);
2020.02.14T18:00:23.148

1.3 Delete scheduled job

To delete a scheduled job, use the function deleteScheduledJob . The syntax is as follows:

deleteScheduledJob(jobId)

The parameter jobId is the job ID. Before deleting, you can use getScheduledJobs to get the job ID of the job you want to delete.

2. Permissions when the timing job is running

What identity does the user log in as when creating a timed job, and running under this identity when executing a timed job. Therefore, when a user creates a scheduled job, it is necessary to ensure that the user has permission to access the resources used. For example, if the logged-in user is not an authorized user, they cannot access the distributed functions of the cluster. If the distributed functions of the cluster are used, errors will occur during execution. In the following example, user guestUser1 does not have access to DFS:

def foo1(){
	print "Test scheduled job "+ now()
	cnt=exec count(*) from loadTable("dfs://FuturesContract","tb")
	print "The count of table is "+cnt
	return cnt
}
login("guestUser1","123456")
scheduleJob(`guestGetDfsjob, "dfs read", foo1, [12:00m, 21:03m, 21:45m], 2020.01.01, 2021.12.31, "D");

After the job is executed, use getJobMessage(`guestGetDfsjob) to query, as shown below, the scheduled job does not have permission to read the distributed database:

2020-02-14 21:03:23.193039 Start the job [guestGetDfsjob]: dfs read
2020-02-14 21:03:23.193092 Test the scheduled job at 2020.02.14T21:03:23.193
2020-02-14 21:03:23.194914 Not granted to read table dfs://FuturesContract/tb

Therefore, if you want to remotely execute certain functions of the control node and access a distributed table in the cluster, you need to log in as an administrator (admin) or another authorized user. This can be done through the login function.

It can also be found from the log shown that the statement after accessing the distributed table is not executed, which means that if an error is encountered during the execution of the job, the execution will be interrupted. In order to prevent the occurrence of exceptions and stop the execution of subsequent scripts, you can use try-catch statements to capture exceptions. The running information needs to be output during the running process, which can be printprinted. The output will be recorded in the jodId.msg log file.

3. Serialization of timing jobs

After the scheduled job is created, the system will persist the creation user (userID), job ID, description information, start time, job frequency, job definition, etc. The storage path is <homeDir>/sysmgmt/jobEditlog.meta. The job is represented by a function of DolphinDB. The definition of a function includes a series of statements, which in turn call other functions and some global objects, such as shared variables. Shared variables are represented by names when they are serialized. When deserializing, the shared variable must exist, otherwise it will fail. Job functions or their dependent functions can be divided into two categories according to whether they are compiled: compiled functions include built-in functions and plug-in functions, and script functions include custom functions, function views, and functions in modules. The serialization methods of these two types of functions are different, which are explained separately below.

3.1 Serialization of compiled functions

For the serialization of compiled functions, only the function name and module name are serialized. During deserialization, these modules and functions will be searched in the system. If they are not found, it will fail. Therefore, if the plug-in function is used in the timing job, it needs to be pre-loaded before deserialization. The order of initialization of system and timing job-related component resources is: system-level initialization script (dolphindb.dos), function view (function view), user-level startup script (startup.dos), and timing job. The scheduled job is loaded after the startup script is executed. As shown in the following example, the odbc plugin is used in the job function jobDemo:

use odbc
def jobDemo(){
	conn = odbc::connect("dsn=mysql_factorDBURL");
}
scheduleJob("job demo","example of init",jobDemo,15:48m, 2019.01.01, 2020.12.31, 'D')

However, the odbc plug-in is not loaded when the system is started, so when reading the scheduled job, because this function cannot be recognized, the following log is output and the system exits.

<ERROR>:Failed to unmarshall the job [job demo]. Failed to deserialize assign statement.. Invalid message format

After adding the following code to the startup script to load the odbc plug-in, the system starts successfully.

loadPlugin("plugins/odbc/odbc.cfg")

3.2 Serialization of script functions

The script function serializes the function parameters and every statement in the function definition. If the statement contains dependent script functions, the definitions of these dependent functions will also be serialized.

After the timed job is created, if these script functions are deleted or modified, or the script function it depends on is modified, the running of the timed job will not be affected. If you want the timed job to be executed according to the new function, you need to delete the timed job first, and then recreate the timed job, otherwise the old serialized function will run. Note that the associated function also needs to be redefined. The following example illustrates:

  • Example 1: The job function is modified after the timed job is created. As shown below, the job function f is redefined after the scheduleJob is created:
def f(){
	print "The old function is called " 
}
scheduleJob(`test, "f", f, 11:05m, today(), today(), 'D');
go
def f(){
	print "The new function is called " 
}

After the timing job is executed, use getJobMessage(`test) to get the following information, from which you can see that the timing job is still an old custom function.

2020-02-14 11:05:53.382225 Start the job [test]: f
2020-02-14 11:05:53.382267 The old function is called 
2020-02-14 11:05:53.382277 The job is done.
  • Example 2: The function that the job function depends on is modified after the scheduled job is created. As shown below, the job function is the function view fv, which calls the function foo. After scheduleJob, the function foois redefined and the function view is regenerated:
def foo(){
	print "The old function is called " 
}
def fv(){
	foo()
}
addFunctionView(fv)  

scheduleJob(`testFvJob, "fv", fv, 11:36m, today(), today(), 'D');
go
def foo(){
	print "The new function is called " 
}
dropFunctionView(`fv)
addFunctionView(fv) 

After the timing job is executed, then getJobMessage(`testFvJob) gets the following information, from which you can see that the timing job is still executing the old function.

2020-02-14 11:36:23.069892 Start the job [testFvJob]: fv
2020-02-14 11:36:23.069939 The old function is called 
2020-02-14 11:36:23.069951 The job is done.

The same is true with module functions. We create a module printLog.dos with the following content:

module printLog
def printLogs(logText){
	writeLog(string(now()) + " : " + logText)
	print "The old function is called"
}

Then create a timing job to call the printLog::printLogs function:

use printLog
def f5(){
	printLogs("test my log")
}
scheduleJob(`testModule, "f5", f5, 13:32m, today(), today(), 'D');

Modify the module as follows before running the timing job:

module printLog
def printLogs(logText){
	writeLog(string(now()) + " : " + logText)
	print "The new function is called"
}

After the timing job is executed, then getJobMessage(`testModule) gets the following information, from which we can see that the timing job is still executing the old function.

2020-02-14 13:32:22.870855 Start the job [testModule]: f5
2020-02-14 13:32:22.871097 The old function is called
2020-02-14 13:32:22.871106 The job is done.

4. Run the script file regularly

When creating a timed job, if the job function is to run a script file, because only the file name is saved during serialization, the file content is not saved, so you need to put all dependent custom functions in the script file, otherwise you will be unable to find it. The execution of the custom function failed. For example, create a script file testjob.dos with the following content:

foo()

Then execute the following script in the DolphinDB GUI:

def foo(){
	print ("Hello world!")
}
run "/home/xjqian/testjob.dos"

The results show that it can be executed normally:

2020.02.14 13:47:00.992: executing code (line 104-108)...
Hello world!

Then create the script file of scheduled job run, the code is as follows:

scheduleJob(`dailyfoofile1, "Daily Job 1", run {"/home/xjqian/testjob.dos"}, 16:14m, 2020.01.01, 2020.12.31, 'D');

But the following exception occurred when running this job:

Exception was raised when running the script [/home/xjqian/testjob.dos]:Syntax Error: [line #3] Cannot recognize the token foo

This is because the foo function definition and the timing job execution are not in the same session, and the function definition cannot be found when the job is executed. Put the definition of foo() in the script file and modify the content of the testjob.dos file as follows:

def foo(){
	print ("Hello world!")
}
foo()

Then re-create the timing job and run this script file to complete it smoothly.

5. Summary and outlook

Common faults and troubleshooting

  • The job function refers to a shared variable, but the shared variable is not defined before the job is loaded. It is generally recommended to define the shared variable in the user's startup script.
  • The job function refers to a function in the plug-in, but the plug-in is not loaded before the job is loaded. It is generally recommended to define and load the plug-in in the user's startup script.
  • Run a script file regularly, but the dependent function cannot be found. The script file must contain dependent custom functions.
  • The user who creates the scheduled job does not have permission to access the distributed database tables. Authorize the user to access the corresponding database.
  • An exception is thrown when using the functions scheduleJobgetScheduledJobs, and deleteScheduledJob in the startup script . When the node is started, the timing job is initialized after the startup script , so the timing job-related functions cannot be used in the startup script

In some rare cases, it may happen that when the system restarts, the timing job loading fails, or even the system cannot be started. Especially when the version is upgraded, functional interfaces such as built-in functions and plug-in functions may change, which may cause jobs to fail to load, or some compatibility bugs may cause system restart failure. Therefore, we need to keep the script that defines the timing job during development. If the system cannot be started due to a scheduled task, you can delete the serialized files of the scheduled job first <homeDir>/sysmgmt/jobEditlog.meta, and then recreate these scheduled jobs after the system restarts.

Follow-up function development

  • Added the ability to browse the definition of job functions and dependent functions.
  • Define and implement the dependency relationship between timing jobs.

 

Posted on 2020-02-25

Guess you like

Origin blog.csdn.net/qq_41996852/article/details/112302781