关系型数据库
1、启动SQL引擎。 将创建一个引擎以连接到工作目录中的SQLite数据库'Chinook.sqlite'。
# Import necessary module
from sqlalchemy import create_engine
# Create engine: engine
engine = create_engine('sqlite:///Chinook.sqlite')
这里,'sqlite:///Chinook.sqlite'被称为SQLite数据库Chinook.sqlite的连接字符串。
2、使用引擎上的方法table_names()可以知道数据库里面有哪些表。
# Import necessary module
from sqlalchemy import create_engine
# Create engine: engine
engine = create_engine('sqlite:///Chinook.sqlite')
# Save the table names to a list: table_names
table_names = engine.table_names()
# Print the table names to the shell
print(table_names)
<script.py> output:
['Album', 'Artist', 'Customer', 'Employee', 'Genre', 'Invoice', 'InvoiceLine', 'MediaType', 'Playlist', 'PlaylistTrack', 'Track']
3、使用python对数据库进行查询操作。(方法一)
# Import packages
from sqlalchemy import create_engine
import pandas as pd
# Create engine: engine
engine = create_engine('sqlite:///Chinook.sqlite')
# Open engine connection: con
con = engine.connect()
# Perform query: rs
rs = con.execute('select * from Album')
# Save results of the query to DataFrame: df
df = pd.DataFrame(rs.fetchall())
# Close connection
con.close()
# Print head of DataFrame df
print(df.head())
<script.py> output:
0 1 2
0 1 For Those About To Rock We Salute You 1
1 2 Balls to the Wall 2
2 3 Restless and Wild 2
3 4 Let There Be Rock 1
4 5 Big Ones 3
自定义查询 fetchmany(size=3)—— size:要返回的行数, fetchall()——取全部
使用 df.columns = rs.keys() 将列索引 换为 真实的索引。
from sqlalchemy import create_engine
import pandas as pd
# Open engine in context manager
# Perform query and save results to DataFrame: df
with engine.connect() as con:
rs = con.execute('select LastName,Title from Employee')
df = pd.DataFrame(rs.fetchmany(size=3))
df.columns = rs.keys()
# Print the length of the DataFrame df
print(len(df))
# Print the head of the DataFrame df
print(df.head())
<script.py> output:
3
LastName Title
0 Adams General Manager
1 Edwards Sales Manager
2 Peacock Sales Support Agent
不使用df.columns = rs.keys() 结果:
0 1
0 Adams General Manager
1 Edwards Sales Manager
2 Peacock Sales Support Agent
使用SQL的WHERE过滤数据库记录
# Create engine: engine
engine = create_engine('sqlite:///Chinook.sqlite')
# Open engine in context manager
# Perform query and save results to DataFrame: df
with engine.connect() as con:
rs = con.execute('select * from Employee where EmployeeId >= 6')
df = pd.DataFrame(rs.fetchall())
df.columns = rs.keys()
# Print the head of the DataFrame df
print(df.head())
<script.py> output:
EmployeeId LastName FirstName Title ReportsTo BirthDate \
0 6 Mitchell Michael IT Manager 1 1973-07-01 00:00:00
1 7 King Robert IT Staff 6 1970-05-29 00:00:00
2 8 Callahan Laura IT Staff 6 1968-01-09 00:00:00
HireDate Address City State Country \
0 2003-10-17 00:00:00 5827 Bowness Road NW Calgary AB Canada
1 2004-01-02 00:00:00 590 Columbia Boulevard West Lethbridge AB Canada
2 2004-03-04 00:00:00 923 7 ST NW Lethbridge AB Canada
PostalCode Phone Fax Email
0 T3B 0C5 +1 (403) 246-9887 +1 (403) 246-9899 [email protected]
1 T1K 5N8 +1 (403) 456-9986 +1 (403) 456-8485 [email protected]
2 T1H 1Y8 +1 (403) 467-3351 +1 (403) 467-8772 [email protected]
使用ORDER BY排序SQL记录
# Create engine: engine
engine = create_engine('sqlite:///Chinook.sqlite')
# Open engine in context manager
with engine.connect() as con:
rs = con.execute('select * from Employee order by BirthDate')
df = pd.DataFrame(rs.fetchall())
# Set the DataFrame's column names
df.columns = rs.keys()
# Print head of DataFrame
print(df.head())
<script.py> output:
EmployeeId LastName FirstName Title ReportsTo \
0 4 Park Margaret Sales Support Agent 2.0
1 2 Edwards Nancy Sales Manager 1.0
2 1 Adams Andrew General Manager NaN
3 5 Johnson Steve Sales Support Agent 2.0
4 8 Callahan Laura IT Staff 6.0
BirthDate HireDate Address City \
0 1947-09-19 00:00:00 2003-05-03 00:00:00 683 10 Street SW Calgary
1 1958-12-08 00:00:00 2002-05-01 00:00:00 825 8 Ave SW Calgary
2 1962-02-18 00:00:00 2002-08-14 00:00:00 11120 Jasper Ave NW Edmonton
3 1965-03-03 00:00:00 2003-10-17 00:00:00 7727B 41 Ave Calgary
4 1968-01-09 00:00:00 2004-03-04 00:00:00 923 7 ST NW Lethbridge
State Country PostalCode Phone Fax \
0 AB Canada T2P 5G3 +1 (403) 263-4423 +1 (403) 263-4289
1 AB Canada T2P 2T3 +1 (403) 262-3443 +1 (403) 262-3322
2 AB Canada T5K 2N1 +1 (780) 428-9482 +1 (780) 428-3457
3 AB Canada T3B 1Y7 1 (780) 836-9987 1 (780) 836-9543
4 AB Canada T1H 1Y8 +1 (403) 467-3351 +1 (403) 467-8772
Email
0 [email protected]
1 [email protected]
2 [email protected]
3 [email protected]
4 [email protected]
4、直接使用pandas 查询 利用pandas的强大功能将SQL查询的结果写入一个DataFrame
pd.read_sql_query('select * from Employee ',engine)
如下代码为两种方法对比:
# Import packages
from sqlalchemy import create_engine
import pandas as pd
# Create engine: engine
engine = create_engine("sqlite:///Chinook.sqlite")
# Execute query and store records in DataFrame: df
df = pd.read_sql_query('select * from Album', engine)
# Print head of DataFrame
print(df.head())
# Open engine in context manager and store query result in df1
with engine.connect() as con:
rs = con.execute("SELECT * FROM Album")
df1 = pd.DataFrame(rs.fetchall())
df1.columns = rs.keys()
# Confirm that both methods yield the same result
print(df.equals(df1))
<script.py> output:
AlbumId Title ArtistId
0 1 For Those About To Rock We Salute You 1
1 2 Balls to the Wall 2
2 3 Restless and Wild 2
3 4 Let There Be Rock 1
4 5 Big Ones 3
True
使用where 和order by
# Import packages
from sqlalchemy import create_engine
import pandas as pd
# Create engine: engine
engine = create_engine('sqlite:///Chinook.sqlite')
# Execute query and store records in DataFrame: df
df = pd.read_sql_query('select * from Employee where EmployeeId >=6 order by BirthDate',engine)
# Print head of DataFrame
print(df.head())
SQL的强大之处在于表之间的关系:INNER JOIN
import pandas as pd
from sqlalchemy import create_engine
engine = create_engine('sqlite:///Chinook.sqlite')
# Open engine in context manager
# Perform query and save results to DataFrame: df
with engine.connect() as con:
rs = con.execute('select Title,Name from Album inner join Artist on Album.ArtistID = Artist.ArtistID')
df = pd.DataFrame(rs.fetchall())
df.columns =rs.keys()
# Print head of DataFrame df
print(df.head())
过滤你的INNER JOIN
df = pd.read_sql_query('select * from PlaylistTrack inner join Track on PlaylistTrack.TrackId =Track.TrackId where Milliseconds <250000',engine)