foreword
Fairlearn is an open source project designed to help data scientists improve the fairness of artificial intelligence systems. At present, there are no relevant tutorials in China to explain how to use this library, so the author uses a series of blogs to teach how to use the Fairlearn library in as much detail as possible. "Add your own personal insights on the basis of the official website tutorial."
This blog is the second in this series, and then five blogs will be used to explain the API in FairLearn in detail. The main content of this article is the built-in data set in Fairlearn.
There are three datasets built into FairLearn, UCI adult, Bank Marketing and Boston House.
1. UCI Adult
**Introduction:**UCI adult is a dataset for predicting whether a citizen's annual income is greater than $50K.
**Import data set: **Import data set needs to use fairlearn.datasets
the package , the specific code is as follows
from fairlearn import datasets
adult = datasets.fetch_adult(cache=True, data_home=None, as_frame=False, return_X_y=False)
1.1 fetch_adult()
parameters
The UCI Adult dataset can be obtained through fetch_adult()
the method . Next, let's look at fetch_adult()
the parameters of .
cache
: Boolean, default isTrue
, whether to use cache when downloading dataset.data_home
:String
type, the default isNone
, used to specify the cache address of the downloaded data set, and all data are stored in~/.fairlearn-data
the directory .as_frame
: Boolean, the default isFalse
, whether the downloaded data isPandas
ofDataFrame
type.return_X_y
: Boolean, default isFalse
. If yesTrue
, the returned data only contains two parts, data and label. That isdata.data
anddata.target
.
1.2 fetch_adult()
return value
After reading the parameters fetch_adult()
of , let's take a look at its return value, which consists of four parts:
data
:data.target
:Label.feature_names
: Feature name.DESCR
:String
type, description of the dataset.
2. Bank Marketing
**Introduction:**Bank Marketing is a data set used to predict the success rate of customers purchasing financial products.
The specific code to import the dataset is as follows:
bank = datasets.fetch_bank_marketing(cache=True, data_home=None, as_frame=False, return_X_y=False)
It can be seen that the only difference between the acquisition of this data set and the former is the function name. fetch_bank_marketing()
The parameters and return value of the method are fetch_adult()
consistent
3. Boston House
Introduction : The Boston house price data set is a classic data set in machine learning, which predicts house prices in the Boston area through a series of features. But what is less known is that this dataset is also an unfair dataset
The specific code to import the dataset is as follows:
boston = datasets.fetch_boston(cache=True, data_home=None, as_frame=False, return_X_y=False, warn=True
You can see fetch_boston()
that there is one more warn
parameter in the method. The parameter is of boolean type and defaults to True
. When the parameter True
is , a warning will be thrown to remind the user that the data set is unfair.