Comparison between Python and MySQL (6): Using Pandas to realize the effect of MySQL date function

I. Introduction

Environment:
windows11 64-bit
Python3.9
MySQL8
pandas1.4.2

This article mainly introduces the date function in MySQL date_add()/date_sub(), date_format(), year()/month()/day()/hour()/minute()/second(), datediff(), datepart(), how to use pandas to implement, and what is the difference between the two.from_unixtime()unix_timestamp()

Note: Python is a very flexible language. There may be multiple ways to achieve the same goal. What I provide is only one of the solutions. If you have other methods, please leave a message to discuss.

2. Grammatical comparison

data sheet

The data used this time are as follows.
The syntax for constructing this dataset using Python is as follows:

import pandas as pd
import numpy as np

df1 = pd.DataFrame({
    
     'col1' : list(range(1,7))
                    ,'col2' : ['AA','AA','AA','BB','AA','BB']#list('AABCA')
                    ,'col3' : ['2022-01-01','2022-01-01','2022-01-02','2022-01-02','2022-01-03','2022-01-03']
                    ,'col4' : ['2022-02-01','2022-01-21','2022-01-23','2022-01-12','2022-02-03','2022-01-05']
                    ,'col5' : [1643673600,1642723200,1642896000,1641945600,1643846400,1641340800]
                   })
df1['col3'] = pd.to_datetime(df1.col3)
df1['col4'] = pd.to_datetime(df1.col4)
df1

Note: Just put the code in the cell of jupyter and run it. df1The data corresponding to the call will be directly used in the following .

The syntax for constructing this dataset using MySQL is as follows:

with t1 as(
  select  1 as col1, 'AA' as col2, '2022-01-01' as col3, '2022-02-01' as col4, 1643673600 as col5 union all
  select  2 as col1, 'AA' as col2, '2022-01-01' as col3, '2022-01-21' as col4, 1642723200 as col5 union all
  select  3 as col1, 'AA' as col2, '2022-01-02' as col3, '2022-01-23' as col4, 1642896000 as col5 union all
  select  4 as col1, 'BB' as col2, '2022-01-02' as col3, '2022-01-12' as col4, 1641945600 as col5 union all
  select  5 as col1, 'AA' as col2, '2022-01-03' as col3, '2022-02-03' as col4, 1643846400 as col5 union all
  select  6 as col1, 'BB' as col2, '2022-01-03' as col3, '2022-01-05' as col4, 1641340800 as col5 
)
select * from t1;

Note: Just put the code in the MySQL code run box and run it. When running the SQL code later, the data set (lines 1 to 8 of the code) is brought by default, and only the query statement is displayed, such as line 9.

The corresponding relationship is as follows:

Python dataset MySQL dataset
df1 t1

date_add()/date_sub()

The addition and subtraction of time, in MySQL, is date_add()/date_sub()implemented using , and the two can be used interchangeably, as long as a negative sign is added to the addition/subtraction time (see the following example for details).
In Pandas, it can be achieved by Timedelta()or DateOffset(), and there is a difference between the two. If you are calculating the difference for the month and year, you can only use the latter; if you are calculating the day, hour, minute, and second, the two are common.
The syntax parameters corresponding to the time range are shown in the following table:

time limit date_add()/date_sub() pandas.Timedelta() pandas.DateOffset()
Year year - years
moon month - months
Day day days days
hour hour hours hours
point minute minutes minutes
Second second seconds seconds

1. Add 1 day
to add 1 day to MySQL, which can be used date_add()+1 dayor used date_sub()-1 day.
In Pandas, you can use the DateFrame time column to add pd.Timedelta(days=1)or directly pd.DateOffset(days=1).

language Python MySQL
the code 【Python1】
df1.col3 + pd.Timedelta(days=1)
【Python2】
df1.col3 + pd.DateOffset(days=1)
【MySQL1】
select date_add(t1.col3,interval 1 day) as col3_1 from t1;
【MySQL2】
select date_sub(t1.col3,interval -1 day) as col3_1 from t1;
result image.png image.png

2. Subtract 1
day and add 1 day, and vice versa. Add can be changed to subtract, just look at the code, no more details.

language Python MySQL
the code 【Python1】
df1.col3 + pd.Timedelta(days=-1)
【Python2】
df1.col3 + pd.DateOffset(days=-1)
【MySQL1】
select date_add(t1.col3,interval -1 day) as col3_1 from t1;
【MySQL2】
select date_sub(t1.col3,interval 1 day) as col3_1 from t1;
result image.png image.png

datediff()

To calculate the difference of time, in MySQL, use datediff(<被减数>,<减数>)(ie <minuend>-<subtrahend>) to achieve; in Pandas, the operation is relatively simple, just subtract two Series. However, the data type after subtraction is that timedelta64[ns]if it is to be used for comparison, or needs to be converted into an integer, timedelta64[ns]the value is extracted, daysand its attributes can be used apply()to extract the value. The specific code logic is shown in the following example.

language Python MySQL
the code (df1.col4-df1.col3).apply(lambda x:x.days) select datediff(col4,col3) as diff from t1;
result image.png image.png

date_format()

Formatting, in MySQL, use date_format(), in Python, use strftime(), both are to convert the time type into a string type. The identifiers are slightly different, the former uses %iminutes and seconds %s, while the latter uses %Mminutes and seconds %S.
Refer to the following table for specific format:

time frame (example) date_format() strftime()
Year, 0000~9999 %Y %Y
month, 01~12 %m %m
Sun, 01-31 %d %d
hour, 00~24 %H %H
minutes, 00~59 %i %M
Second, 00~59 %s %S

The format is: year-month
MySQL date_format(列,"<格式符号>")can directly use the function application; in Python, since strftime('<格式符号>')it is applied to the time type, df1.col3it is a Series type, so it needs to be used apply()to assist in processing each value (the following Python code).

language Python MySQL
the code df1.col3.apply(lambda x:x.strftime(‘%Y-%m’)) select date_format(t1.col3,‘%Y-%m’) as col3_1 from t1;
result image.png image.png

year()/month()/day()/hour()/minute()/second()

To get a certain part of the time (such as: year, month, day, hour, minute, second), in MySQL, directly use the corresponding function to act on the field.
In Python, the value of the time type also has corresponding attributes to obtain the corresponding value. Similarly, since it df1.col3is a Series type, it needs to be used apply()to assist in processing each value (the following Python code).

language Python MySQL
the code df_timepart = pd.concat([
df1.col4.apply(lambda x:x.year)
,df1.col4.apply(lambda x:x.month)
,df1.col4.apply(lambda x:x.day)
,df1.col4.apply(lambda x:x.hour)
,df1.col4.apply(lambda x:x.minute)
,df1.col4.apply(lambda x:x.second)
],axis=1
)
df_timepart.columns=[‘year’,‘month’,‘day’,‘hour’,‘minute’,‘second’]
df_timepart
select year(col4),month(col4),day(col4),hour(col4),minute(col4),second(col4) from t1;
结果 image.png image.png

from_unixtime()/unix_timestamp()

使用时间戳时,需要特别注意:pandas 采用的是 零时区的时间,MySQL 会默认当地时间,北京时间采用的是东八区,所以北京的时间会比零时区早8小时,也就是说,同一个时间戳,北京时间会比零时区时间多8小时,如:1577836800,转化为北京时间是【2020-01-01 08:00:00】,转化为零时区时间为【2020-01-01 00:00:00】。

1、时间戳转时间
时间戳转时间,在 MySQL 中,通过from_unixtime()函数直接作用于列即可,还可以指定时间格式,格式化字符参考date_format()中的表格。
在 Pandas 中,通过to_datetime()实现,注意需要指定unit,它根据时间戳的精度设置,常见参数有:【D,s,ms】,分别对应日数、秒数、毫秒数(相对1970-01-01 00:00:00的间隔数)。
注意:如果需要转化为东八区,只能通过手动添加 8 小时。

语言 Python MySQL
代码 【Python 1 默认时区】
pd.to_datetime(df1.col5, unit=‘s’)
【Python 2 东八时区】
pd.to_datetime(df1.col5, unit=‘s’)+pd.Timedelta(hours=8)
select from_unixtime(col5) from t1;
结果 image.png image.png

2、时间转时间戳
时间转时间戳,在 MySQL 中,通过unix_timestamp()函数直接作用于列即可。
在 Pandas 中,通过apply()+timestamp()实现,如果需要转化为东八区,先对时间做一层tz_localize("Asia/Shanghai")处理,然后再转化即可,返回的是浮点数。
注意:这里有一个小细节,由于返回的值默认是科学计数方式,而我需要查看完整数字串,而且没有小数值,我加了int()处理。如果使用的时间精确到毫秒,即存在小数,加int()处理会丢失精度,应用时需要结合自己的实际情况和需求做处理。

语言 Python MySQL
代码 [Python 1 default time zone]
df1.col4.apply(lambda x:int(x.timestamp()))
[Python 2 Dongba time zone]
df1.col4.apply(lambda x:int(x.tz_localize(“Asia/Shanghai ").timestamp()))
select unix_timestamp(col4) from t1;
result image.png image.png

3. Summary

Timedelta()1. Use or for custom addition and subtraction of a time DateOffset();
2. Direct addition and subtraction of the difference between two times;
3. Formatting use strftime();
4. Take the specified part of the time and use the corresponding attributes year, month, day, hour, minute, second;
5. Timestamp and time conversion: to_datetime(), timestamp().

Guess you like

Origin blog.csdn.net/qq_45476428/article/details/129129020