API in FairLearn (1)

foreword

Fairlearn is an open source project designed to help data scientists improve the fairness of artificial intelligence systems. At present, there are no relevant tutorials in China to explain how to use this library, so the author uses a series of blogs to teach how to use the Fairlearn library in as much detail as possible. "Add your own personal insights on the basis of the official website tutorial."

Fairlearn official website

This blog is the second in this series, and then five blogs will be used to explain the API in FairLearn in detail. The main content of this article is the built-in data set in Fairlearn.

There are three datasets built into FairLearn, UCI adult, Bank Marketing and Boston House.

1. UCI Adult

**Introduction:**UCI adult is a dataset for predicting whether a citizen's annual income is greater than $50K.

**Import data set: **Import data set needs to use fairlearn.datasetsthe package , the specific code is as follows

from fairlearn import datasets
adult = datasets.fetch_adult(cache=True, data_home=None, as_frame=False, return_X_y=False)

1.1 fetch_adult()parameters

The UCI Adult dataset can be obtained through fetch_adult()the method . Next, let's look at fetch_adult()the parameters of .

  • cache: Boolean, default is True, whether to use cache when downloading dataset.
  • data_home: Stringtype, the default is None, used to specify the cache address of the downloaded data set, and all data are stored in ~/.fairlearn-datathe directory .
  • as_frame: Boolean, the default is False, whether the downloaded data is Pandasof DataFrametype.
  • return_X_y: Boolean, default is False. If yes True, the returned data only contains two parts, data and label. That is data.dataand data.target.

1.2 fetch_adult()return value

After reading the parameters fetch_adult()of , let's take a look at its return value, which consists of four parts:

  • data:data.
  • target:Label.
  • feature_names: Feature name.
  • DESCR: Stringtype, description of the dataset.

2. Bank Marketing

**Introduction:**Bank Marketing is a data set used to predict the success rate of customers purchasing financial products.

The specific code to import the dataset is as follows:

bank = datasets.fetch_bank_marketing(cache=True, data_home=None, as_frame=False, return_X_y=False)

It can be seen that the only difference between the acquisition of this data set and the former is the function name. fetch_bank_marketing()The parameters and return value of the method are fetch_adult()consistent

3. Boston House

Introduction : The Boston house price data set is a classic data set in machine learning, which predicts house prices in the Boston area through a series of features. But what is less known is that this dataset is also an unfair dataset

The specific code to import the dataset is as follows:

boston = datasets.fetch_boston(cache=True, data_home=None, as_frame=False, return_X_y=False, warn=True

You can see fetch_boston()that there is one more warnparameter in the method. The parameter is of boolean type and defaults to True. When the parameter Trueis , a warning will be thrown to remind the user that the data set is unfair.

おすすめ

転載: blog.csdn.net/jiaweilovemingming/article/details/127726301