用python导入数据（三）

关系型数据库

1、启动SQL引擎。将创建一个引擎以连接到工作目录中的SQLite数据库'Chinook.sqlite'。

# Import necessary module
from sqlalchemy import create_engine

# Create engine: engine
engine = create_engine('sqlite:///Chinook.sqlite')

这里，'sqlite：///Chinook.sqlite'被称为SQLite数据库Chinook.sqlite的连接字符串。

2、使用引擎上的方法table_names（）可以知道数据库里面有哪些表。

# Import necessary module
from sqlalchemy import create_engine

# Create engine: engine
engine = create_engine('sqlite:///Chinook.sqlite')

# Save the table names to a list: table_names
table_names = engine.table_names()

# Print the table names to the shell
print(table_names)

3、使用python对数据库进行查询操作。（方法一）

# Import packages
from sqlalchemy import create_engine
import pandas as pd

# Create engine: engine
engine = create_engine('sqlite:///Chinook.sqlite')

# Open engine connection: con
con = engine.connect()

# Perform query: rs
rs = con.execute('select * from Album')

# Save results of the query to DataFrame: df
df = pd.DataFrame(rs.fetchall())

# Close connection
con.close()

# Print head of DataFrame df
print(df.head())

自定义查询 fetchmany（size=3）—— size：要返回的行数， fetchall（）——取全部

使用 df.columns = rs.keys() 将列索引换为真实的索引。

from sqlalchemy import create_engine
import pandas as pd

# Open engine in context manager
# Perform query and save results to DataFrame: df
with engine.connect() as con:
    rs = con.execute('select LastName,Title from Employee')
    df = pd.DataFrame(rs.fetchmany(size=3))
    df.columns = rs.keys()

# Print the length of the DataFrame df
print(len(df))

# Print the head of the DataFrame df
print(df.head())

不使用df.columns = rs.keys() 结果：

0 1
0 Adams General Manager
1 Edwards Sales Manager
2 Peacock Sales Support Agent

使用SQL的WHERE过滤数据库记录

# Create engine: engine
engine = create_engine('sqlite:///Chinook.sqlite')

# Open engine in context manager
# Perform query and save results to DataFrame: df
with engine.connect() as con:
    rs = con.execute('select * from Employee where EmployeeId >= 6')
    df = pd.DataFrame(rs.fetchall())
    df.columns = rs.keys()

# Print the head of the DataFrame df
print(df.head())

使用ORDER BY排序SQL记录

# Create engine: engine
engine = create_engine('sqlite:///Chinook.sqlite')

# Open engine in context manager
with engine.connect() as con:
    rs = con.execute('select * from Employee order by BirthDate')
    df = pd.DataFrame(rs.fetchall())

    # Set the DataFrame's column names
    df.columns = rs.keys()

# Print head of DataFrame
print(df.head())

4、直接使用pandas 查询 利用pandas的强大功能将SQL查询的结果写入一个DataFrame

pd.read_sql_query('select * from Employee ',engine)

如下代码为两种方法对比：

# Import packages
from sqlalchemy import create_engine
import pandas as pd

# Create engine: engine
engine = create_engine("sqlite:///Chinook.sqlite")

# Execute query and store records in DataFrame: df
df = pd.read_sql_query('select * from Album', engine)

# Print head of DataFrame
print(df.head())

# Open engine in context manager and store query result in df1
with engine.connect() as con:
    rs = con.execute("SELECT * FROM Album")
    df1 = pd.DataFrame(rs.fetchall())
    df1.columns = rs.keys()

# Confirm that both methods yield the same result
print(df.equals(df1))

使用where 和order by

# Import packages
from sqlalchemy import create_engine
import pandas as pd

# Create engine: engine
engine = create_engine('sqlite:///Chinook.sqlite')

# Execute query and store records in DataFrame: df
df = pd.read_sql_query('select * from Employee where EmployeeId >=6 order by BirthDate',engine)

# Print head of DataFrame
print(df.head())

SQL的强大之处在于表之间的关系：INNER JOIN

import pandas as pd
from sqlalchemy import create_engine
engine = create_engine('sqlite:///Chinook.sqlite')

# Open engine in context manager
# Perform query and save results to DataFrame: df
with engine.connect() as con:
    rs = con.execute('select Title,Name from Album inner join Artist on Album.ArtistID = Artist.ArtistID')
    df = pd.DataFrame(rs.fetchall())
    df.columns =rs.keys()

# Print head of DataFrame df
print(df.head())

过滤你的INNER JOIN

df = pd.read_sql_query('select * from PlaylistTrack inner join Track on PlaylistTrack.TrackId =Track.TrackId where Milliseconds <250000',engine)

用python导入数据（三）

猜你喜欢