Alibaba Cloud Big Data Actual Record 2: Is the data table empty after scheduling instances?

source of the problem

Last week, a form was drawn, let’s call it Form A. Since the source data of this form is not updated frequently, it is set to be updated twice a week, both at 0:10 am on Tuesday and Friday.
Last Friday, some business personnel reported that some relevant panel data were inaccurate. After checking it, the problem was locked: the lack of table A caused the data logic to be confused. At that time, there were other tasks at hand, so I didn't investigate the reason in detail. I ran the form scheduling again and fixed the problem first.

When I wanted to use the form today (Tuesday), I found that the form was empty again! Run it again, and there will be data. This caught my attention. There is no problem when not scheduling, but when scheduling is executed, something goes wrong?

explore solutions

In [Dataworks O&M Center], find the instance scheduled this morning, and check the related operation logs.
Since I re-run the scheduling instance in the early morning, I can only go through the [Run Log] of [Expand Details]. The specific path is as follows: As you can see from the above figure, there is a warning. The partition
image.png
image.png
table I read is called Table B , yesterday's data is empty, so when output to my A table, the number of data records is 0.

Table B is a daily partition table, and it is impossible to have no data, unless the task hangs up, or there is a problem with the update frequency.
Perhaps the cause of the task hanging can be ruled out, at least the possibility is not as high as the latter, because there was no abnormality in the task last Friday; first look at the update frequency issue.

Find Table B from [Dataworks Data Map], and check the form information. From the partition information, we can see that the execution time is mostly around 3:00, that is to say, when I read the table, there is no relevant data in the partition table.
image.png

solution

In order to ensure that data can be read normally when reading table B, it is necessary to modify the schedule after creating partitions and writing data in table B , but since this time is floating, it has to be configured later. I amended the scheduling time to 8:10.

There is no problem in theory, but it still needs to be observed in practice. If there are other bugs, it will be added later.

Guess you like

Origin blog.csdn.net/qq_45476428/article/details/127636629