Pandas: Select all data from Pandas DataFrame between two dates

Maksim Kostinov :

I have DataFrame in which there is a column with event dates (dates are not unique). I need to select all the data that is in this period. I try next:

start_day = datetime.date(datetime.strptime(start_day, '%d.%m.%Y')) #change user data to date format

end_day = datetime.date(datetime.strptime(end_day, '%d.%m.%Y'))

df = df[df['Transaction_date'].between(start_day, end_day)]

The type of column Transaction_date is datetime64[ns]. When I run the code with the request for the period 01/01/2020 to 31/01/2020 - part of the data for the specified period is lost (information is displayed only for 21 days, it should be for 31 days).

I see that the Data Frame is contained data for the entire requested period. Can you help me please, where is my mistake?

Valdi_Bo :

I used the following test DataFrame:

   Transaction_date  Value
0        2019-12-31     10
1        2020-01-10     10
2        2020-01-15     10
3        2020-01-20     10
4        2020-01-25     10
5        2020-01-28     10
6        2020-01-29     10
7        2020-01-30     10
8        2020-01-31     10
9        2020-02-01     10
10       2020-02-01     10

Start / end dates are:

start_day = '01.01.2020'
end_day = '31.01.2020'

The code is:

# Convert start / end dates to datetime
start_day = pd.to_datetime(start_day)
end_day = pd.to_datetime(end_day)

I noticed that you use datetime module, which I think is a bad practice. To this end use dedicated pandasonic function (to_datetime). It is even clever enough to recognize many of commonly used date formats, so there is no need to specify it on your own.

To get the actual result, run:

df[df['Transaction_date'].between(start_day, end_day)]

The result is:

  Transaction_date  Value
1       2020-01-10     10
2       2020-01-15     10
3       2020-01-20     10
4       2020-01-25     10
5       2020-01-28     10
6       2020-01-29     10
7       2020-01-30     10
8       2020-01-31     10

So as you can see:

  • The date from 2019 has been eliminated.
  • The output contains only January dates, to the very end.
  • Dates from the beginning of February have also been eliminated.

Repeat my example on your computer. Maybe the source of your problem (that the result dous not contain all dates from January) is somewhere else?

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=13755&siteId=1