Loading datasets

# Load scikit-learn's datasets
from sklearn import datasets

# Load digits dataset(手写数字数据集)
digits = datasets.load_digits()

# Create features matrix
features = digits.data

# Create target vector
target = digits.target
# View first observation
features[0]

部分数据集：

load_boston
Contains 503 observations on Boston housing prices. It is a good dataset for
exploring regression algorithms.
load_iris
Contains 150 observations on the measurements of Iris flowers. It is a good data‐
set for exploring classification algorithms.
load_digits
Contains 1,797 observations from images of handwritten digits. It is a good data‐
set for teaching image classification.

CSV file

网络上url :

# Load library
import pandas as pd

# Create URL
url = 'https://tinyurl.com/simulated_data'

# Load dataset
dataframe = pd.read_csv(url)

# View first two rows
dataframe.head(2)

本地 file:

dataframe = pd.read_csv(r'path')

EXCEL

# Load library
import pandas as pd

# Create URL
url = 'https://tinyurl.com/simulated_excel'

# Load data
dataframe = pd.read_excel(url, sheetname=0, header=1)

# View the first two rows
dataframe.head(2)

# ps： sheetname can accept both strings containing the name of the sheet and
integers pointing to sheet positions (zero-indexed). If we need to load multiple sheets,
include them as a list. For example, sheetname=[0,1,2, "Monthly Sales"] will
return a dictionary of pandas DataFrames containing the first, second, and third
sheets and the sheet named Monthly Sales.

JSON file

# Load library
import pandas as pd

# Create URL
url = 'https://tinyurl.com/simulated_json'

# Load data
dataframe = pd.read_json(url, orient='columns')

# View the first two rows
dataframe.head(2)

注意： orient parameter, which indicates to pandas how the JSON file
is structured. However, it might take some experimenting to figure out which argu‐
ment (split, records, index, columns, and values) is the right one. Another helpful
tool pandas offers is json_normalize, which can help convert semistructured JSON
data into a pandas DataFrame.

SQL 数据库访问

# Load libraries
import pandas as pd
from sqlalchemy import create_engine

# Create a connection to the database
database_connection = create_engine('sqlite:///sample.db')

# Load data
dataframe = pd.read_sql_query('SELECT * FROM data', database_connection)

# View first two rows
dataframe.head(2)

python机器学习基础笔记3之加载数据（cook book）

Loading datasets

CSV file

EXCEL

JSON file

SQL 数据库访问

猜你喜欢