Pandas - How adding new empty rows with index coming from a list of DateTimeIndex?

pierre_j :

I am sorry, I am kind of stuck.

I would like to use DateTimeIndex coming from a column in a dataframe, to create new rows in another dataframe.

These DateTimeIndex have to be used as indexes for the new rows.

So with following data:

import pandas as pd

df = pd.DataFrame({'Start': [pd.Timestamp('1970-01-02 00:00:00'),pd.Timestamp('1970-03-02 00:00:00')], 'End': [pd.Timestamp('1970-01-02 00:10:00'), pd.Timestamp('1970-03-02 00:10:00')], 'Freq': [pd.Timedelta(5,'m'),pd.Timedelta(5,'m')]})
df = df.apply(lambda x: pd.date_range(start = x.Start, end = x.End, freq = x.Freq), axis=1)

df2 = pd.DataFrame({'Timestamp':[pd.Timestamp('1970-01-03 00:00:00')], 'Data':[4]}).set_index('Timestamp')

I get the inputs:

In [62]: df2.index
Out[62]: DatetimeIndex(['1970-01-03'], dtype='datetime64[ns]', name='Timestamp', freq=None)

In[63]: df.to_list()
Out[21]: 
[DatetimeIndex(['1970-01-02 00:00:00', '1970-01-02 00:05:00',
                '1970-01-02 00:10:00'],
               dtype='datetime64[ns]', freq='5T'),
 DatetimeIndex(['1970-03-02 00:00:00', '1970-03-02 00:05:00',
                '1970-03-02 00:10:00'],
               dtype='datetime64[ns]', freq='5T')]

What I would like to get is a dataframe based on df2, with new rows, having as timestamps those coming from df.

df2_new
                    Data
Timestamp       
1970-01-03 00:00:00    4
1970-01-02 00:00:00
1970-01-02 00:05:00
1970-01-02 00:10:00
1970-03-02 00:00:00
1970-03-02 00:05:00
1970-03-02 00:10:00

I tried with following line, but I get an error:

df2.reindex(df2.index.to_list() + df.to_list())

TypeError: unhashable type: 'DatetimeIndex'

The example I give is simplified as df has a single row, but it could have several.

Please, do you have any idea how I could do?

Thanks in advance for your help! Have a good evening, Bests!

Scott Boston :

IIUC, you can define your 'time range' a little differently, but the key step is to use pd.Index.union:

import pandas as pd

df = pd.DataFrame({'Start':[pd.Timestamp('1970-01-02 00:00:00')], 
                   'End':[pd.Timestamp('1970-01-02 00:10:00')], 
                   'Freq':[pd.Timedelta(5,'m')]})
timerange = df.apply(lambda x: pd.Series(pd.date_range(start = x.Start, 
                                                       end = x.End, 
                                                       freq = x.Freq)), 
                     axis=1).stack()[0]

df2 = pd.DataFrame({'Timestamp':[pd.Timestamp('1970-01-03 00:00:00')], 
                    'Data':[4]}).set_index('Timestamp')

df2 = df2.reindex(df2.index.union(timerange))
df2

Output:

                     Data
1970-01-02 00:00:00   NaN
1970-01-02 00:05:00   NaN
1970-01-02 00:10:00   NaN
1970-01-03 00:00:00   4.0

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=198475&siteId=1