Dry goods丨DolphinDB cross section engine tutorial for time series database

When processing real-time streaming data, it is not only necessary to do vertical aggregation calculation according to time ( time series aggregation engine ), but also to do horizontal comparison and calculation of the latest data, such as finding the percentile of the latest quotation of all stocks in finance, industrial property Calculate the average temperature of a batch of equipment in the network. DolphinDB database provides a cross-sectional aggregation engine that can perform aggregation operations on the latest data of all groups in the streaming data.

The main body of the cross section engine is divided into two parts: the cross section data table and the calculation engine. The cross section data is the internal table of the cross section engine, which saves the latest section data of all groups. The calculation engine is a set of aggregate calculation expressions and triggers. The system triggers the aggregation operation in a specified way, and the calculation result is output to another table.

1. Basic usage

In the DolphinDB database, create a cross-section aggregation engine through createCrossSectionalAggregator. It returns a cross-section data table, which saves the latest cross-section data of all groups. Writing data to this table means that these data enter the cross-section aggregation engine for calculation. The specific usage is as follows:

createCrossSectionalAggregator(name, [metrics], dummyTable, [outputTable], keyColumn, [triggeringPattern="perBatch"], [triggeringInterval=1000])
  • name is a string representing the name of the cross-section aggregation engine and is the unique identifier of the cross-section aggregation engine. It can contain letters, numbers and underscores, but it must start with a letter.
  • Metrics are meta codes. It can be a system built-in or user-defined function, such as <[sum(qty), avg(price)]>, you can use expressions for the aggregation results, such as <[avg(price1)-avg(price2)]>, You can also perform aggregation operations on calculated columns, such as <[std(price1-price2)]>. For details, please refer to meta programming.
  • DummyTable is a table object, it does not need to contain data, but its structure must be the same as the subscribed streaming data table.
  • outputTable is a table object, used to save calculation results. The number of columns in the output table is the number of metrics + 1, the first column is of the TIMESTAMP type, used to store the timestamp when the calculation occurred, and the data types of other columns must be consistent with the data types of the results returned by metrics.
  • keyColumn is a string that specifies a column of the dummyTable as the key of the cross-section aggregation engine. Each key in the keyColumn specified column corresponds to a unique row in the table.
  • triggeringPattern is a string representing the method of triggering calculation. It can have the following values:
    • "perRow": A calculation is triggered every time a row of data is inserted
    • "perBatch": A calculation is triggered every time data is inserted
    • "interval": Trigger the calculation at a certain time interval
  • triggeringInterval is an integer. It takes effect only when the value of triggeringPattern is interval, which indicates the time interval for triggering calculation. The default value is 1000 milliseconds.

2. Example

Here is an example to illustrate the application of the cross-section aggregation engine. In financial transactions, it is often necessary to know the latest average value of all stocks, the sum of the most recent trading volume, and the trading volume of the most recent transaction in real time. DolphinDB's cross-sectional aggregation engine combined with streaming data subscription functions can easily accomplish these tasks.

(1) Create a real-time transaction table

The real-time trading table trades of stocks contains the following main fields:

sym: stock code
time: time
price: transaction price
qty: volume

Whenever a transaction occurs, real-time data will be written into the trades table. The script to create the trades table is as follows:

share streamTable(10:0,`time`sym`price`qty,[TIMESTAMP,SYMBOL,DOUBLE,INT]) as trades

(2) Create a cross-sectional aggregation engine

tradesCrossAggregator=createCrossSectionalAggregator("CrossSectionalDemo", <[avg(price), sum(qty), sum(price*qty)]>, trades, outputTable, `sym, `perRow)

tradesCrossAggregator is a cross-sectional data table, which is grouped by stock code, and each stock has one and only one row. When the data enters the table, avg(price), sum(qty) and sum(price*qty) of each stock will be calculated. A calculation is triggered every time a piece of data is inserted.

(3) Cross-section data table subscription real-time transaction table

subscribeTable(,"trades","tradesCrossAggregator",-1,append!{tradesCrossAggregator},true)

Through the streaming data subscription function, real-time data is written into the cross-sectional data table.

(4) Simulation data generation

def writeData(n){
   timev = 2000.10.08T01: 01: 01.001 + timestamp (1..n)
   symv   = take(`A`B, n)
   pricev = take(102.1 33.4 73.6 223,n)
   qtyv   = take(60 74 82 59, n)
   insert into trades values(timev, symv, pricev,qtyv)
}
writeData(4);

Check the real-time transaction table, there are 4 data in total.

select * from trades
time                    sym price qty
----------------------- --- ----- ---
2000.10.08T01:01:01.002 A   102.1 60 
2000.10.08T01:01:01.003 B   33.4  74 
2000.10.08T01:01:01.004 A   73.6  82 
2000.10.08T01:01:01.005 B   223   59

Check the cross-sectional data table, which saves the two recent transaction records of the two stocks A and B.

select * from tradesCrossAggregator
time                    sym price qty
----------------------- --- ----- ---
2000.10.08T01:01:01.004 A   73.6  82 
2000.10.08T01:01:01.005 B   223   59

Check the output table of the cross-section engine. Since the cross-section engine uses the frequency of perRow to trigger calculations per row, the aggregation engine will perform a calculation every time a row of data is written to the cross-section table, so there are 4 records in total.

select * from outputTable
time                    avgPrice sumqty Total  
----------------------- -------- ------ -------
2019.07.08T10:04:41.731 102.1    60     6126   
2019.07.08T10:04:41.732 67.75    134    8597.6 
2019.07.08T10:04:41.732 53.5     156    8506.8 
2019.07.08T10:04:41.732 148.3    141    19192.2

View the status of the cross section engine through the getAggregatorStat function.

getAggregatorStat (). CrossSectionalAggregator
name               user  status lastErrMsg numRows numMetrics metrics            triggeringPattern triggeringInterval
------------------ ----- ------ ---------- ------- ---------- ------------------ ----------------- ------------------
CrossSectionalDemo guest OK                2       3          [ avg(price), su...perRow            1000

Remove the cross section engine through the removeAggregator function.

removeAggregator("CrossSectionalDemo")

3. Several ways to trigger calculations

The cross section engine has three ways to trigger calculations: perRow, perBatch, and interval. In the above example, a calculation is triggered every time a row of data is inserted. Here are two other ways to trigger calculations.

  • perBatch

The perBatch parameter means that a write is triggered every time a batch of data is added. The following example enables the cross-section engine in perBatch mode. The script generates a total of 12 records and writes them in three batches. The output table is expected to have 3 records.

share streamTable(10:0,`time`sym`price`qty,[TIMESTAMP,SYMBOL,DOUBLE,INT]) as trades
outputTable = table(1:0, `time`avgPrice`sumqty`Total, [TIMESTAMP,DOUBLE,INT,DOUBLE])
tradesCrossAggregator=createCrossSectionalAggregator("CrossSectionalDemo", <[avg(price), sum(qty), sum(price*qty)]>, trades, outputTable, `sym, `perBatch)
subscribeTable(,"trades","tradesCrossAggregator",-1,append!{tradesCrossAggregator},true)
def writeData(n){
   timev = 2000.10.08T01: 01: 01.001 + timestamp (1..n)
   symv   = take(`A`B, n)
   pricev = take(102.1 33.4 73.6 223,n)
   qtyv   = take(60 74 82 59, n)
   insert into trades values(timev, symv, pricev,qtyv)
}
//Write three batches of data, it is expected that three calculations will be triggered and three aggregation results will be output.
writeData(4);
writeData(4);
writeData(4);

View the cross section data sheet.

select * from tradesCrossAggregator
time                    sym price qty
----------------------- --- ----- ---
2000.10.08T01:01:01.002 A   73.6  82 
2000.10.08T01:01:01.003 B   33.4  59

View the output table. Three batches of data are inserted, so there are 3 records in the output table.

select * from outputTable
time                    avgPrice sumqty Total  
----------------------- -------- ------ -------
2019.07.08T10:14:54.446 148.3    141    19192.2
2019.07.08T10:14:54.446 148.3    141    19192.2
2019.07.08T10:14:54.446 148.3    141    19192.2
  • interval

When the trigger calculation method is interval, you need to specify triggeringInterval, which means that the calculation is triggered every triggeringInterval milliseconds. In the following example, 12 records are written in 6 times with an interval of 500 milliseconds. Set the cross-section engine to trigger a calculation every 1000 milliseconds, and it is expected that 3 records will be output.

share streamTable(10:0,`time`sym`price`qty,[TIMESTAMP,SYMBOL,DOUBLE,INT]) as trades
outputTable = table(1:0, `time`avgPrice`sumqty`Total, [TIMESTAMP,DOUBLE,INT,DOUBLE])
tradesCrossAggregator=createCrossSectionalAggregator("CrossSectionalDemo", <[avg(price), sum(qty), sum(price*qty)]>, trades, outputTable, `sym, `interval,1000)
subscribeTable(,"trades","tradesCrossAggregator",-1,append!{tradesCrossAggregator},true)
def writeData(n){
   timev = 2000.10.08T01: 01: 01.001 + timestamp (1..n)
   symv   = take(`A`B, n)
   pricev = take(102.1 33.4 73.6 223,n)
   qtyv   = take(60 74 82 59, n)
   insert into trades values(timev, symv, pricev,qtyv)
}
a = now()
writeData(2);
sleep(500)
writeData(2);
sleep(500)
writeData(2);
sleep(500)
writeData(2);
sleep(500)
writeData(2);
sleep(500)
writeData(2);
sleep(500)
b = now()
select count(*) from outputTable

3

If you execute select count(*) from outputTable again, you will find that the number of records in the output table will continue to grow over time. This is because in the interval mode, the calculation is triggered regularly according to the actual time and does not depend on whether new data comes in.

4. Independent use of cross section data sheet

As can be seen from the above example, although the cross section table is an intermediate data table provided for aggregation calculation, it can actually play a role independently in many occasions. For example, we need to refresh the latest trading price of a certain stock regularly. According to the conventional idea, we screen stocks by code from the real-time trading table and take out the last record. The data volume of the trading table increases rapidly over time. If you do it frequently Such a query is not a good practice in terms of system resource consumption or query performance. The cross-section table always only saves the latest transaction data of all stocks, and the data volume is stable, which is very suitable for this kind of timing polling scenario.

If you want to use the cross section table alone, you need to set the metrics and outputTable to null when creating the cross section engine.

tradesCrossAggregator=createCrossSectionalAggregator("CrossSectionalDemo", , trades,, `sym, `perRow)


Related Links:

Streaming Data Tutorial

Time series engine tutorial

Anomaly Detection Engine Tutorial


Guess you like

Origin blog.51cto.com/15022783/2596733