[Turn] Pandas data processing (5) — DataTime time format processing

Pandas data processing (5) — DataTime time format processing!

 

Datatime is a time data type in Python. It is more convenient to convert between different time formats. Pandas also supports the DataTime data mechanism, which can be used to achieve many useful functions, such as

1. The function to_datetime() converts the Series column in the data list into datetime type,

#Convert the type to datetime
apple.Date = pd.to_datetime(apple.Date)
apple['Date'].head()
​
#
0   2014-07-08
1   2014-07-07
2   2014-07-03
3   2014-07-02
4   2014-07-01
Name: Date, dtype: datetime64[ns]
​

2. DataFrame.resample(freq), resample the global data based on the time column with freq as the frequency, and calculate the segment data sum, mean, variance and other indicators; the index of the original data in the following example is the Datatime data format, Use month as the time unit to find the average value of each column of data

# Resample the data based the offset,get the mean of data
# BM — bussiness month end frequency
​
apple_month = apple.resample("BM").mean()
apple_month.head()

 

 

The following will briefly introduce how Pandas handles DataFrame data based on a few exercises.

1, to_datetime() and resample() operations

1.1, read data

url = "https://raw.githubusercontent.com/guipsamora/pandas_exercises/master/09_Time_Series/Apple_Stock/appl_1980_2014.csv"
apple =pd.read_csv(url)
apple.head()

As you can see, the time is in the column of Date, but it is not in the standard datetime format. It needs to be formatted.

 

 

1.2, datetime format conversion

#Convert the type to datetime
apple.Date = pd.to_datetime(apple.Date)
apple['Date'].head()

 

 

1.3, set the Date column to index

apple = apple.set_index("Date")
# Set Index
apple.head()

Although Date has been set to index, the time arrangement is not clear. Datetime data can be sorted directly. Here sort_index(ascending = True) is used to complete the sorting.

 

 

1.4, sort the index

# Sort The DataFrame based on Date columns
apple.sort_index(ascending = True).head()

 

 

1.5, sample the data by month and get the mean()

# Resample the data based the offset,get the mean of data
# BM — bussiness month end frequency
​
apple_month = apple.resample("BM").mean()
apple_month.head()

 

 

The full name of BM is Bussiness Month, which means business month. It is called DataOffset in Pandas. In addition to month, it also provides year, day, second, hour, minute, etc. as sampling units. Of course, it can also be customized.

 

 

For specific details about Data Offset, please refer to: https://pandas.pydata.org/docs/user_guide/timeseries.html#timeseries-offset-aliases ;

1.6, calculate the number of days between the earliest date and the latest date in the time list

(apple.index.max()-apple.index.min()).days
​
#
12261

2. Count the stock prices of Apple, Tesla, IBM and LINKD in the past two years

2.1, pandas_datareader gets data

import pandas as pd
from pandas_datareader import data as web
import datetime as dt
​
start = dt.datetime(2019,1,1)
end = dt.datetime.today()
stocks = ['APPLE','TSLA','IBM','LNKD']
df = web.DataReader(stocks,'yahoo',start,end)
df

Before using, please make sure that the pandas_datareader package has been installed successfully. This package helps us to directly obtain the stock information of each company in the past two years through the crawler. The two datetimes of start and end are used to limit the time.

The results show that it seems that this method can not get the stock prices of Apple and LINKD (but it does not affect, because here is mainly to learn the usage of datetime in Pandas)

 

 

 

2.2, get stock data

​
vol = df['Volume']
vol

 

 

2.3, create a new column, said week, year

Cluster analysis will be done later, and the clustering benchmarks are week and year, so two columns (week, year) of data need to be created in advance

vol['week'] = vol.index.week
vol['year'] = vol.index.year
vol.head()

 

 

2.4, groupby clustering (first week, then year)

week = vol.groupby(['week','year']).sum()
​
week.head()

In this way, it is possible to compare clearly, the total value of each company’s stock changes for each week from 2019 to 2020.

 

 

 

Okay, that's all of the content of this article; finally, thank you all for reading!

Reference:

1,https://pandas.pydata.org/docs/user_guide/timeseries.html#timeseries-offset-aliases

2,https://github.com/guipsamora/pandas_exercises/blob/master/09_Time_Series/Getting_Financial_Data

Posted on 2020-12-31

Guess you like

Origin blog.csdn.net/weixin_52071682/article/details/112460735