Article directory
How data is used on UC Irvine Machine Learning’s new website
New website https://archive.ics.uci.edu/:
pd.read_csv() reads website data
. Since the new version of the website does not have
the usage method of the old version of Data Folder:
data = pd.read_csv("https://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/breast-cancer-wisconsin.data")
Except for known links that can be used, any other data can no longer pass this method (I searched for a long time on the new website and couldn't find the Data Folder).
For example:
So you can only try to download.
The downloaded data is similar to:
How to use it?
Download the data first, then the .data data is what you want, just use pd.read_csv().
data = pd.read_csv('../data/breast_cancer_wisconsin_original/breast-cancer-wisconsin.data')
Some data acquisition types for machine learning
pd.read_csv()
- Read local csv file
facebook = pd.read_csv('../data/FBlocation/train.csv')
- Some URLs of the old version of uc i ml website
data = pd.read_csv("https://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/breast-cancer-wisconsin.data", names=names)
- The new version of the website data set needs to be downloaded first
data = pd.read_csv('../data/breast_cancer_wisconsin_original/breast-cancer-wisconsin.data')
Use the data sets that come with sklean.datasets
from sklearn.datasets import load_iris
iris = load_iris()
openml data set
Data set website: https://www.openml.org/search?type=data&status=active
from sklearn.datasets import fetch_openml
data = fetch_openml(data_id=853)
# 其中的data_id是网站中每个数据集的独特标号