从数据库中导出数据成Dataframe格式两种方法效率比较

方法1:

import pymysql
import pandas as pd
import time
import xlrd

first = time.time() #在数据库中操作150s,在python中操作320s


#方法1
con = pymysql.connect(host="localhost",user="root",password="root",db="test")
data_sql=pd.read_sql("SELECT `2018设备大表`.`设备ID` as ID1,SUM(`2018设备大表`.`投件量`)/SUM(`2018设备大表`.`箱格数`) as 投件率1 from `2018设备大表`  where `日期` BETWEEN '2018/11/03' AND '2018/11/10' GROUP BY `设备ID`",con)
print(data_sql)
last = time.time()
print('共耗时%s'%(last-first))

结果:

           ID1投件率1
0 1000021 0.451389
1 1000022 0.500000
2 1000027 0.876389
3 1000028 0.273438
4 1000029 0.763889
5 1000031 0.946181
...... ......
79900 1121542 0.000000
79901 0.000000 1121545
79902 21 0.364583
79903 0.000000 4055
79904 4081 0.385417
79905 491 0.523611
79906 52 0.182870

[79907行x 2列]
共耗时53.555063247680664

方法2:


con = create_engine('mysql+pymysql://root:root@localhost:3306/test')
data_sql2 = pd.read_sql_query("SELECT `2018设备大表`.`设备ID` as ID1,SUM(`2018设备大表`.`投件量`)/SUM(`2018设备大表`.`箱格数`) as 投件率1 from `2018设备大表`  where `日期` BETWEEN '2018/11/03' AND '2018/11/10' GROUP BY `设备ID`", con

print(data_sql2)
last = time.time()
print('共耗时%s s'%(last-first))

结果:

           ID1投件率1
0 1000021 0.451389
1 1000022 0.500000
2 1000027 0.876389
3 1000028 0.273438
4 1000029 0.763889
5 1000031 0.946181
6 1000032 1.611111
7 1000033 0.002315
...... ......
79904 4081 0.385417
79905 491 0.523611
79906 52 0.182870

[79907行x 2列]
共耗时46.47865843772888 s

第一种方法耗时53s 

第二种方法耗时46s

相比之下,第二种方法效率较高

猜你喜欢

转载自blog.csdn.net/HUIxihuanni/article/details/84988340
今日推荐