I have a csv with 2 columns, one with a date populated in it, the second column with a rate value. The file contains some missing rows based on the date column.
I would like some python code that can can fill in the missing dates between the first row and the last row (between 01/01/2019 and 14/01/2019), the second task is to then fill in the missing rate with the previous days rate.
For example, 04 and 05 of Jan are missing, these rows need to be created and the previous days rate is on 03 Jan - 1.12 so that rate needs to be populated in for 04 and 05 Jan.
The code needs to be dynamic, so the first and last row will not always be the same for each file. For example, a second file can have first row and last row values of 03/02/2019 and 25/02/2019. The same code needs to be able to run on each file if possible.
The input will be a csv and the output also needs to be a csv file.
Input -
Date,Rate
01/01/2019,1.12
02/01/2019,1.13
03/01/2019,1.12
06/01/2019,1.11
07/01/2019,1.13
08/01/2019,1.14
09/01/2019,1.13
10/01/2019,1.11
12/01/2019,1.12
13/01/2019,1.13
14/01/2019,1.14
Please let me know if you have any questions.
First you need to make sure your date is datetime
type, and you can use resample
:
# resample
df['Date'] = pd.to_datetime(df['Date'], dayfirst=True)
new_df = df.set_index('Date').resample('D').ffill().reset_index()
Output:
Date Rate
0 2019-01-01 1.12
1 2019-01-02 1.13
2 2019-01-03 1.12
3 2019-01-04 1.12
4 2019-01-05 1.12
5 2019-01-06 1.11
6 2019-01-07 1.13
7 2019-01-08 1.14
8 2019-01-09 1.13
9 2019-01-10 1.11
10 2019-01-11 1.11
11 2019-01-12 1.12
12 2019-01-13 1.13
13 2019-01-14 1.14