Using Sqlite Efficiently with Pandas

1. Introduction

As an introduction, we will talk a little about why we need "sqlite".

1.1 Compare with other SQL (MySQL/ MS SQL Server/ ...)?

  • Firstly, sqlite is a built in library of Python3.
  • Secondly, sqlite is a lite version of SQL database. Full version SQL needs many additional works to get start. For example, create a server/ client, manage users/ passwords/ permissions, and so on. Whereas in sqlite we will not have these spare works. We can have a quick access to SQL world. Actually, sqlite will only create a .db database file.

1.2 Compare with flat file(csv/ txt/ ...)?

  • A normal flat file, like csv, if we want to do some query, even we only want to find one value, we will have to first load all data into memory. Whereas in SQL world it's not necessary. SQL will load only a part of data into memory for query purpose.
  • In SQL all tables will be contained in a single databse file(.db). But if we use flat file, we will have to store each-table-per-file. This will certainly increase our mamagement cost.
  • Thirdly, SQL has better input-output efficiency. In flat file, if we want to change a value in table, we will have to rewrite the whole table to hardware. But in SQL it only change only a part of data in the table.

Sqlite is a great library in Python. Even more luckly pandas can works with sqlite(and other kind of SQL) very well. So today we will talk about using squlite efficiently with pandas.

import sqlite3
import pandas

  

2. Basic

Before further, we will have a look at some basic of sqlite.

2.1 Create database

There are two important objects in sqlite: connect and cursor. We will use them to communicate and operate database.

But if we connect to a database doesn't exist, sqlite will create one for us under the hood. So we can create a database with ease.

conn = sqlite3.connect('Pricing.db')
c = conn.cursor()

2.2 Common operations

We can create table, insert data, update data, and so on with SQL syntax. Although we will not go into detail of SQL grammar, we will do a demonstration of sqlite implement.

One quick note, for safty reason, SQL is designed as any INSERT/ UPDATE/ DELETE opearaion will not perform into real database. But will be recorded in a journal. So We need to write an extra commit order.

(Famous example, transcation between two bank accounts, suddenly electricity stop.)

# create table
# be careful wrong comma will NOT raise an error but still makes these columns invalid to use
sql = "CREATE TABLE MarchPrice(价格 REAL, 时间 TEXT, 渠道 TEXT, 分类 TEXT)"
c.execute(sql)

# manually insert 
sql = "INSERT INTO MarchPrice VALUES(150.0, '01MAR20-07MAR20', 'MOBILE APP', '新增')"
c.execute(sql)
conn.commit()

# select
sql = "SELECT * FROM MarchPrice"
c.execute(sql)
for row in c.fetchall():
    print(row)
# output:
# (150.0, '01MAR20-07MAR20', 'MOBILE APP', '新增')

We also want to add some change to the SQL sentence, which we can use "?" as variable holders.

With combation of for-loop, we can achieve semi-auto rows increasing.

# some varibles
price = [140, 145, 155, 200]
time = ['01MAR20-07MAR20', '02MAR20-08MAR20', '03MAR20-09MAR20', '04MAR20-10MAR20']
channel = 'MOBILE APP'
cls = '新增'

# semi-auto insert
sql = "INSERT INTO MarchPrice VALUES(?, ?, ?, ?)"
for i in range(len(price)):
    c.execute(sql, (float(price[i]), time[i], channel, cls))
conn.commit()

sql = "SELECT * FROM MarchPrice"
c.execute(sql)
for row in c.fetchall():
    print(row)

# output
# (150.0, '01MAR20-07MAR20', 'MOBILE APP', '新增')
# (140.0, '01MAR20-07MAR20', 'MOBILE APP', '新增')
# (145.0, '02MAR20-08MAR20', 'MOBILE APP', '新增')
# (155.0, '03MAR20-09MAR20', 'MOBILE APP', '新增')
# (200.0, '04MAR20-10MAR20', 'MOBILE APP', '新增')

Of course, it doesn't have to be list or array, it can be Series of pandas. When using Series, we can use Series.dtypes to ensure our data fits in database table's require.

But as an old saying goes: Never use for-loop!

I know this is not true, but in data science it can remind us to keep exporing what Python ecosystem already exists.

3. Pandas

Except the famous quote we have just mentioned, there is another problem.

Using place holder "?" can bring us some change, but what if I have a big table which has 50 columns? Do I have to write 50 question marks to specify variables?

No, and we can use pandas api to comunicate with SQL.

3.1 To SQL

For example, we have a new table as below:

raw = {
    '价格': [200, 210, 140, 156, 70],
    '时间': '05MAR20-31MAR20',
    '渠道': 'Website',
    '分类': '新增'
}

MarchPrice = pd.DataFrame(raw)

Now we can use DataFrame.to_sql() to insert data. If our index of DataFrame has no meaning, we can pass index=False.

After the operation, we again use old method to see what's in our table.

MarchPrice.to_sql('MarchPrice', conn, index=False, if_exists='append')

sql = "SELECT * FROM MarchPrice"
c.execute(sql)
for row in c.fetchall():
    print(row)

# output
# (150.0, '01MAR20-07MAR20', 'MOBILE APP', '新增')
# (140.0, '01MAR20-07MAR20', 'MOBILE APP', '新增')
# (145.0, '02MAR20-08MAR20', 'MOBILE APP', '新增')
# (155.0, '03MAR20-09MAR20', 'MOBILE APP', '新增')
# (200.0, '04MAR20-10MAR20', 'MOBILE APP', '新增')
# (200.0, '05MAR20-31MAR20', 'Website', '新增')
# (210.0, '05MAR20-31MAR20', 'Website', '新增')
# (140.0, '05MAR20-31MAR20', 'Website', '新增')
# (156.0, '05MAR20-31MAR20', 'Website', '新增')
# (70.0, '05MAR20-31MAR20', 'Website', '新增')

Using this method, no matter how many columns we have, we can insert data into SQL with ease.

3.2 From SQL

We can use sentence like above to see what's in our SQL table. But we don't like the appearance of this style. Besides, we can not see each column's name.

After a little search, we can find out how to use pandas.read_sql_query() to read data into DataFrame more efficiently.

sql = "SELECT * FROM MarchPrice"
pd.read_sql_query(sql, conn)

Clean and Clear. Again we discover how powerful Python ecosystem is. This degree is R-language can hardly reach.

In the end, it is a good habit to close cursor and connection after our job.

c.close()
conn.close()

  

猜你喜欢

转载自www.cnblogs.com/drvongoosewing/p/12383418.html
今日推荐