I'm using Apache Flink in a stock market project to calculate the current price change. The formula is
price_change = (current_price - previous_close_price) / previous_close_price
previous_close_price
is a security's closing price on the preceding day of trading. Everyday before the market opens, I need to update previous_close_price
.
Now I've come up with several solutions but I don't know which one is the best.
Store
previous_close_price
in redis and fetch the price in every calculation. It's easy and flexible to update the price but this solution could kill the performance.Set the TTL of state to 1 day. Get the new state when the old state is expired. But it's not flexible as the TTL is hardcoded.
Broadcast State Pattern. I'm not sure if this solution works.
Send a special message to flink. When flink receives the message, it updates the
previous_close_price
.
Any suggestions are appreicated.
I suggest a variant on #4:
Have two sources, one used only for the closing prices, and the other for the stream of trades. Key both streams by the security, and connect them with a CoProcessFunction. Store the previous_close_price in keyed state, within the CoProcessFunction.
Every day, before the market opens, stream in the updated closing prices.
This could be done with a RichCoFlatMap, but I'm suggesting a CoProcessFunction because you might want to use a side output to report errors (e.g. securities where the previous_close_price is missing).
As for the other approaches:
- I don't see any advantage to keeping the previous_close_price data in an external data store.
- I don't think this works very well. There's no hook available for triggering the loading of the new data, and moreover, the state will only be cleared when it's accessed.
- This doesn't feel like a good use case for broadcast state, unless there's a need for everyone in the cluster to know the closing prices for all securities.