pandas groupby year month



41 down vote accepted

You can use either resample or TimeGrouper (which resample uses under the hood).

First make the datetime column is actually of datetimes (hit it with pd.to_datetime). It's easier if it'd a DatetimeIndex:

In [11]: df1
Out[11]:
            abc  xyz
Date
2013-06-01  100  200
2013-06-03  -20   50
2013-08-15   40   -5
2014-01-20   25   15
2014-02-21   60   80

In [12]: g = df1.groupby(pd.TimeGrouper("M"))  # DataFrameGroupBy (grouped by Month)

In [13]: g.sum()
Out[13]:
            abc  xyz
Date
2013-06-30   80  250
2013-07-31  NaN  NaN
2013-08-31   40   -5
2013-09-30  NaN  NaN
2013-10-31  NaN  NaN
2013-11-30  NaN  NaN
2013-12-31  NaN  NaN
2014-01-31   25   15
2014-02-28   60   80

In [14]: df1.resample("M", how='sum')  # the same
Out[14]:
            abc  xyz
Date
2013-06-30   40  125
2013-07-31  NaN  NaN
2013-08-31   40   -5
2013-09-30  NaN  NaN
2013-10-31  NaN  NaN
2013-11-30  NaN  NaN
2013-12-31  NaN  NaN
2014-01-31   25   15
2014-02-28   60   80

I had thought the following would work, but it doesn't (due to as_index not being respected? I'm not sure.), I'm including this for interests' sake.

If it's a column (it has to be a datetime64 column! as I say, hit it with to_datetime), you can use the PeriodIndex:

In [21]: df
Out[21]:
        Date  abc  xyz
0 2013-06-01  100  200
1 2013-06-03  -20   50
2 2013-08-15   40   -5
3 2014-01-20   25   15
4 2014-02-21   60   80

In [22]: pd.DatetimeIndex(df.Date).to_period("M")  # old way
Out[22]:
<class 'pandas.tseries.period.PeriodIndex'>
[2013-06, ..., 2014-02]
Length: 5, Freq: M

In [23]: per = df.Date.dt.to_period("M")  # new way to get the same

In [24]: g = df.groupby(per)

In [25]: g.sum()  # dang not quite what we want (doesn't fill in the gaps)
Out[25]:
         abc  xyz
2013-06   80  250
2013-08   40   -5
2014-01   25   15
2014-02   60   80

To get the desired result we have to reindex...

41 down vote accepted

You can use either resample or TimeGrouper (which resample uses under the hood).

First make the datetime column is actually of datetimes (hit it with pd.to_datetime). It's easier if it'd a DatetimeIndex:

In [11]: df1
Out[11]:
            abc  xyz
Date
2013-06-01  100  200
2013-06-03  -20   50
2013-08-15   40   -5
2014-01-20   25   15
2014-02-21   60   80

In [12]: g = df1.groupby(pd.TimeGrouper("M"))  # DataFrameGroupBy (grouped by Month)

In [13]: g.sum()
Out[13]:
            abc  xyz
Date
2013-06-30   80  250
2013-07-31  NaN  NaN
2013-08-31   40   -5
2013-09-30  NaN  NaN
2013-10-31  NaN  NaN
2013-11-30  NaN  NaN
2013-12-31  NaN  NaN
2014-01-31   25   15
2014-02-28   60   80

In [14]: df1.resample("M", how='sum')  # the same
Out[14]:
            abc  xyz
Date
2013-06-30   40  125
2013-07-31  NaN  NaN
2013-08-31   40   -5
2013-09-30  NaN  NaN
2013-10-31  NaN  NaN
2013-11-30  NaN  NaN
2013-12-31  NaN  NaN
2014-01-31   25   15
2014-02-28   60   80

I had thought the following would work, but it doesn't (due to as_index not being respected? I'm not sure.), I'm including this for interests' sake.

If it's a column (it has to be a datetime64 column! as I say, hit it with to_datetime), you can use the PeriodIndex:

In [21]: df
Out[21]:
        Date  abc  xyz
0 2013-06-01  100  200
1 2013-06-03  -20   50
2 2013-08-15   40   -5
3 2014-01-20   25   15
4 2014-02-21   60   80

In [22]: pd.DatetimeIndex(df.Date).to_period("M")  # old way
Out[22]:
<class 'pandas.tseries.period.PeriodIndex'>
[2013-06, ..., 2014-02]
Length: 5, Freq: M

In [23]: per = df.Date.dt.to_period("M")  # new way to get the same

In [24]: g = df.groupby(per)

In [25]: g.sum()  # dang not quite what we want (doesn't fill in the gaps)
Out[25]:
         abc  xyz
2013-06   80  250
2013-08   40   -5
2014-01   25   15
2014-02   60   80

To get the desired result we have to reindex...

猜你喜欢

转载自blog.csdn.net/qq_15230053/article/details/79777071