API in Fairlearn (2)

foreword

Fairlearn is an open source project designed to help data scientists improve the fairness of artificial intelligence systems. At present, there are no relevant tutorials in China to explain how to use this library, so the author uses a series of blogs to teach how to use the Fairlearn library in as much detail as possible. "Add your own personal insights on the basis of the official website tutorial."

Fairlearn official website

This blog, the third in a series, focuses on fairness evaluation methods built into Fairlearn.

1. Metrics module

Functionality : The Metrics module provides methods for evaluating fairness-related metrics of a model.

1.1 Ungrouped Metrics

In the simplest case, the evaluation indicator takes the label Y true Y_{true}Ytrueand prediction set Y pred Y_{pred}Ybefore _ _The relationship between. For example, the true positive rate TP = P ( Y pred = 1 ∣ Y true = 1 ) TP = P(Y_{pred}=1|Y_{true}=1)TP=P ( Y)before _ _=1∣Ytrue=1 ) , that is, the probability that the prediction is positive and the actual is also positive. False negative rateFN = PY perd = 0 ∣ Y true = 1 FN = P{Y_{perd}=0|Y_{true}=1}FN=PYlose _ _=0∣Ytrue=1 , which is the probability that the predicted negative is actually positive. Recall rateR ecall = TPTP + FN Recall = \frac{TP}{TP+FN}Recall=TP+FNTP. A specific implementation of the recall rate in the code issklearn.metrics.recall_score()

Code display "Jupyter NoteBook"

import sklearn.metrics as skm
y_true = [0, 1, 1, 1, 1, 0, 1, 0, 1, 0, 0, 0, 1, 1, 1, 1]
y_pred = [0, 0, 1, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1]
skm.recall_score(y_true,y_pred)

The output is as follows

1.2 Metrics with grouping

When considering fairness, we want to see how the indicators differ between different groups, which involves group indicators.

1.2.1 Grouping data into groups

import numpy as np
import pandas as pd

group_membership_data = ['d', 'a', 'c', 'b', 'b', 'c', 'c', 'c',
                         'b', 'd', 'c', 'a', 'b', 'd', 'c', 'c']
pd.set_option('display.max_columns',20)
pd.set_option('display.width',80)
pd.DataFrame({
    
    'y_true':y_true,'y_pred':y_pred,'group_membership_data':group_membership_data})

We divide the y_trueand with from the previous example into four different groups.y_preda,b,c,d

1.2.2 Prediction between group recall

from fairlearn.metrics import MetricFrame
grouped_metrics = MetricFrame(metrics= skm.recall_score,y_pred=y_pred,y_true=y_true,sensitive_features=group_membership_data)
print(grouped_metrics.overall)
print(grouped_metrics.by_group)

1.2.3 Output inter-group statistics

In addition to outputting the evaluation indicators of each component, MetricFrame also supports outputting other statistical information, such as maximum and minimum precision, ratio, etc.

print("min recall over groups = ", grouped_metrics.group_min()) # 最小召回率
print("max recall over groups = ", grouped_metrics.group_max()) # 最大召回率
print("difference in recall = ", grouped_metrics.difference(method='between_groups')) # 组间召回率的最大差异
print("ratio in recall = ", grouped_metrics.ratio(method='between_groups')) # 最小召回率比最大召回率

The result is as follows:

1.2.4 Assessing Multiple Indicators Between Groups

The above example only shows the recall rate. In fact, MetricFrame can evaluate multiple indicators at once.

from fairlearn.metrics import count
multi_metrics = MetricFrame({
    
    'precision':skm.precision_score,'recall':skm.recall_score,'count':count},y_true=y_true,y_pred=y_pred,sensitive_features=group_membership_data) # precision_score 是差准率,是 TP/TP + TN
print(multi_metrics.overall)
print(multi_metrics.by_group)

The result is as follows:

1.2.5 Evaluation indicators between groups with weights

Usually, the weights of different individuals are different, and to obtain weighted inter-group evaluation indicators only needs to add one more sample_paramsparameter

s_w = [1, 2, 1, 3, 2, 3, 1, 2, 1, 2, 3, 1, 2, 3, 2, 3]
s_p = {
    
     'sample_weight':s_w }
weighted_metrics = MetricFrame(metrics=skm.recall_score,
                       y_true=y_true,
                       y_pred=y_pred,
                       sensitive_features=group_membership_data,
                       sample_params=s_p)
print(weighted_metrics.overall)
print(weighted_metrics.by_group)

The result is as follows:

1.2.6 Using evaluation indicators with parameters

The previously used MetricFrame metricsdoes not carry parameters, metricwhat should we do if we need to use parameters?

Directly metricpassing metricsto will directly report an error, as follows

The solution is to encapsulate functools.partial()the fbeta_scoreand betaparameters into one method, as follows

import functools

fbeta_06 = functools.partial(skm.fbeta_score, beta=0.6)
metric_beta = MetricFrame(metrics=fbeta_06, y_true=y_true, y_pred=y_pred, sensitive_features=group_membership_data)
print(metric_beta.overall)
print(metric_beta.by_group)

Note : To recap here fbeta_score, it is a weighted average of recall and precision. The recall rate is recall recallrec a ll , precision isprecision precisionprecision ,那么 F − B e t a = ( 1 + β 2 ) ⋅ p r e c i s i o n ⋅ r e c a l l ( β 2 ⋅ p r e c i s i o n ) + r e c a l l F-Beta = (1+\beta^2)\cdot\frac{precision\cdot recall}{(\beta^2\cdot precision)+recall} FBeta=(1+b2)( b2precision)+recallprecisionrecall。当 β < 1 \beta < 1 b<When 1 , it is biased towards the recall rate,β > 1 \beta > 1b>1 is biased toward recall.

The output is as follows:

1.2.7 Evaluation of multiple sensitive attributes

Previously, we only used a single sensitive attribute, if we need to use multiple sensitive attributes, how to achieve it?

g_2 = [ 8,6,8,8,8,8,6,6,6,8,6,6,6,6,8,6]
s_f_frame = pd.DataFrame(np.stack([group_membership_data,g_2],axis=1),columns=['SX 0','SX 1'])
metric_2_f = MetricFrame(metrics=skm.recall_score,y_pred=y_pred,y_true=y_true,sensitive_features=s_f_frame)
print(metric_2_f.overall)
print(metric_2_f.by_group)

The output is as follows:

In the above code, the only thing we need to understand is np.stack()the method , which is used array()to merge two same dimensions in a certain dimension** "axis = 0 according to the x-axis, axis = 1 according to the y-axis". **

a = np.array([1,2,3])
b = np.array([4,5,6])
np.stack([a,b],axis=1)

The combined result is as follows:

1.3 Scalar Results from MetricFrame

Advanced machine learning algorithms often use metric functions to guide their optimization. Such algorithms usually work with scalar results, so if we wish to tune on fairness metrics, we need to perform aggregations on the MetricFrame.

In this section, we fairlearnuse make_derived_metricthe method provided to customize metric. make_derived_metricBy metriccustomizing based on four basic , they are group_min, group_max, differenceand ratio"choose one of four" .

from fairlearn.metrics import make_derived_metric
fbeta_difference = make_derived_metric(metric=skm.fbeta_score,transform='difference') # 选择的 Base Metric 是 differebce,即最大 F_Beta 与最小 F_Beta 的差
fbeta_difference(y_true,y_pred,beta=0.7,sensitive_features = group_membership_data)

The output is as follows:

The above method can be replaced by the following code, the difference is that the latter needs to usefunctools.partial

fbeta_07 = functools.partial(skm.fbeta_score, beta=0.7)
fbeta_07_metrics = MetricFrame(metrics=fbeta_07, y_true=y_true, y_pred=y_pred, sensitive_features=group_membership_data)
fbeta_07_metrics.difference()

I personally feel that the following method is better, because it can easily call other methods.

1.4 Control features for grouped metrics

Controlling attributes, also known as conditional attributes, enable more detailed fairness insights by providing a further means of dividing data into subgroups. Controlling attributes act like sensitive attributes when data is divided into subgroups. The difference is that overallthe value is carried out on the individual subgroups divided by the control attribute **“other methods such as group_max, group_minalso”**.

Controlling attributes is useful for some expected variation of attributes, so we need to compute the difference while controlling for attributes. For example, in a loan scenario, we want people with different incomes to be approved at different interest rates. However, within each income bracket, we still want to measure differences between different sensitive characteristics.

In one of the constructors of MetricFrame, control_featuresthe parameter .

decision = [
   0,0,0,1,1,0,1,1,0,1,
   0,1,0,1,0,1,0,1,0,1,
   0,1,1,0,1,1,1,1,1,0
]
prediction = [
   1,1,0,1,1,0,1,0,1,0,
   1,0,1,0,1,1,1,0,0,0,
   1,1,1,0,0,1,1,0,0,1
]
control_feature = [
   'H','L','H','L','H','L','L','H','H','L',
   'L','H','H','L','L','H','L','L','H','H',
   'L','H','L','L','H','H','L','L','H','L'
]
sensitive_feature = [
   'A','B','B','C','C','B','A','A','B','A',
   'C','B','C','A','C','C','B','B','C','A',
   'B','B','C','A','B','A','B','B','A','A'
]
metric_c_f = MetricFrame(metrics=skm.accuracy_score,
                         y_true=decision,
                         y_pred=prediction,
                         sensitive_features={
    
    'SF' : sensitive_feature},
                         control_features={
    
    'CF' : control_feature})
print(metric_c_f.overall)
print(metric_c_f.by_group)

The result is as follows:

This is my understanding of the control attribute. When we need to compare the differences between different groups, we use sensitive attributes, such as the income level of people in different regions. But if we want to look at different subgroups within the group, such as the income levels of men and women in a certain region, we can use the control attribute.

1.5 Draw group index map

1.5.1 Basic drawing method

Drawing mainly uses plotthe method , which internally callsmatplotlib

from fairlearn.metrics import false_positive_rate, true_positive_rate, selection_rate
from sklearn.metrics import accuracy_score, recall_score, precision_score

metrics = {
    'accuracy': accuracy_score,
    'precision': precision_score,
    'recall': recall_score,
    'false positive rate': false_positive_rate,
    'true positive rate': true_positive_rate,
    'selection rate': selection_rate,
    'count': count}
metric_frame = MetricFrame(metrics=metrics,
                           y_true=y_true,
                           y_pred=y_pred,
                           sensitive_features=group_membership_data)
metric_frame.by_group.plot.bar(
    subplots=True,
    layout=[3, 3],
    legend=False,
    figsize=[12, 8],
    title="Show all metrics",
)

1.5.2 Modify the unified ylim

In the figure in the previous section, we can see that the ranges of ythe axes are different. If we need to make their ranges consistent (to facilitate comparison), we can use the parametersplot() specified in the method to adjust.ylim

metric_frame.by_group.plot.bar(
    subplots=True,
    ylim=[0,1],
    layout=[3, 3],
    legend=False,
    figsize=[12, 8],
    title="Show all metrics",
)

The result is as follows:

1.5.3 Modify the color system of the drawing

At this point, some people may want to complain about the underworld color scheme selected in the picture above! ! Can't you choose a good-looking color painting by yourself?

This point matplotlibhas been considered in the design, and we can adjust it through colormapparameters . For more good-looking colors, please refer to here: Good-looking colors

Here we choose accentthe color system

metric_frame.by_group.plot.bar(
    subplots=True,
    ylim=[0,1],
    layout=[3, 3],
    legend=False,
    figsize=[12, 8],
    colormap = 'Accent',
    title="Show all metrics",
)

The result is as follows:

1.5.4 Define the drawing style

At this point, my friends may have to ask again, all you draw are histograms, it’s too rubbish to only draw histograms! !

Alas~~, as a professional machine learning fairness library, fairlearnhow could this not be considered? **"In fact, it is considered by Pandas"**We can adjust the type of drawing through kindparameters

metric_frame.by_group.plot(
    kind = 'pie',
    subplots=True,
    ylim=[0,1],
    layout=[3, 3],
    legend=False,
    figsize=[12, 8],
    colormap = 'Accent',
    title="Show all metrics",
)

The result is as follows:

So how do we know what graphics can be drawn? You can refer to the following link: Drawable type

Guess you like

Origin blog.csdn.net/jiaweilovemingming/article/details/127739932