foreword
Fairlearn is an open source project designed to help data scientists improve the fairness of artificial intelligence systems. At present, there are no relevant tutorials in China to explain how to use this library, so the author uses a series of blogs to teach how to use the Fairlearn library in as much detail as possible. "Add your own personal insights on the basis of the official website tutorial."
This blog, the third in a series, focuses on fairness evaluation methods built into Fairlearn.
1. Metrics module
Functionality : The Metrics module provides methods for evaluating fairness-related metrics of a model.
1.1 Ungrouped Metrics
In the simplest case, the evaluation indicator takes the label Y true Y_{true}Ytrueand prediction set Y pred Y_{pred}Ybefore _ _The relationship between. For example, the true positive rate TP = P ( Y pred = 1 ∣ Y true = 1 ) TP = P(Y_{pred}=1|Y_{true}=1)TP=P ( Y)before _ _=1∣Ytrue=1 ) , that is, the probability that the prediction is positive and the actual is also positive. False negative rateFN = PY perd = 0 ∣ Y true = 1 FN = P{Y_{perd}=0|Y_{true}=1}FN=PYlose _ _=0∣Ytrue=1 , which is the probability that the predicted negative is actually positive. Recall rateR ecall = TPTP + FN Recall = \frac{TP}{TP+FN}Recall=TP+FNTP. A specific implementation of the recall rate in the code issklearn.metrics.recall_score()
Code display "Jupyter NoteBook"
import sklearn.metrics as skm
y_true = [0, 1, 1, 1, 1, 0, 1, 0, 1, 0, 0, 0, 1, 1, 1, 1]
y_pred = [0, 0, 1, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1]
skm.recall_score(y_true,y_pred)
The output is as follows
1.2 Metrics with grouping
When considering fairness, we want to see how the indicators differ between different groups, which involves group indicators.
1.2.1 Grouping data into groups
import numpy as np
import pandas as pd
group_membership_data = ['d', 'a', 'c', 'b', 'b', 'c', 'c', 'c',
'b', 'd', 'c', 'a', 'b', 'd', 'c', 'c']
pd.set_option('display.max_columns',20)
pd.set_option('display.width',80)
pd.DataFrame({
'y_true':y_true,'y_pred':y_pred,'group_membership_data':group_membership_data})
We divide the y_true
and with from the previous example into four different groups.y_pred
a,b,c,d
1.2.2 Prediction between group recall
from fairlearn.metrics import MetricFrame
grouped_metrics = MetricFrame(metrics= skm.recall_score,y_pred=y_pred,y_true=y_true,sensitive_features=group_membership_data)
print(grouped_metrics.overall)
print(grouped_metrics.by_group)
1.2.3 Output inter-group statistics
In addition to outputting the evaluation indicators of each component, MetricFrame also supports outputting other statistical information, such as maximum and minimum precision, ratio, etc.
print("min recall over groups = ", grouped_metrics.group_min()) # 最小召回率
print("max recall over groups = ", grouped_metrics.group_max()) # 最大召回率
print("difference in recall = ", grouped_metrics.difference(method='between_groups')) # 组间召回率的最大差异
print("ratio in recall = ", grouped_metrics.ratio(method='between_groups')) # 最小召回率比最大召回率
The result is as follows:
1.2.4 Assessing Multiple Indicators Between Groups
The above example only shows the recall rate. In fact, MetricFrame can evaluate multiple indicators at once.
from fairlearn.metrics import count
multi_metrics = MetricFrame({
'precision':skm.precision_score,'recall':skm.recall_score,'count':count},y_true=y_true,y_pred=y_pred,sensitive_features=group_membership_data) # precision_score 是差准率,是 TP/TP + TN
print(multi_metrics.overall)
print(multi_metrics.by_group)
The result is as follows:
1.2.5 Evaluation indicators between groups with weights
Usually, the weights of different individuals are different, and to obtain weighted inter-group evaluation indicators only needs to add one more sample_params
parameter
s_w = [1, 2, 1, 3, 2, 3, 1, 2, 1, 2, 3, 1, 2, 3, 2, 3]
s_p = {
'sample_weight':s_w }
weighted_metrics = MetricFrame(metrics=skm.recall_score,
y_true=y_true,
y_pred=y_pred,
sensitive_features=group_membership_data,
sample_params=s_p)
print(weighted_metrics.overall)
print(weighted_metrics.by_group)
The result is as follows:
1.2.6 Using evaluation indicators with parameters
The previously used MetricFrame metrics
does not carry parameters, metric
what should we do if we need to use parameters?
Directly metric
passing metrics
to will directly report an error, as follows
The solution is to encapsulate functools.partial()
the fbeta_score
and beta
parameters into one method, as follows
import functools
fbeta_06 = functools.partial(skm.fbeta_score, beta=0.6)
metric_beta = MetricFrame(metrics=fbeta_06, y_true=y_true, y_pred=y_pred, sensitive_features=group_membership_data)
print(metric_beta.overall)
print(metric_beta.by_group)
Note : To recap here fbeta_score
, it is a weighted average of recall and precision. The recall rate is recall recallrec a ll , precision isprecision precisionprecision ,那么 F − B e t a = ( 1 + β 2 ) ⋅ p r e c i s i o n ⋅ r e c a l l ( β 2 ⋅ p r e c i s i o n ) + r e c a l l F-Beta = (1+\beta^2)\cdot\frac{precision\cdot recall}{(\beta^2\cdot precision)+recall} F−Beta=(1+b2)⋅( b2⋅precision)+recallprecision⋅recall。当 β < 1 \beta < 1 b<When 1 , it is biased towards the recall rate,β > 1 \beta > 1b>1 is biased toward recall.
The output is as follows:
1.2.7 Evaluation of multiple sensitive attributes
Previously, we only used a single sensitive attribute, if we need to use multiple sensitive attributes, how to achieve it?
g_2 = [ 8,6,8,8,8,8,6,6,6,8,6,6,6,6,8,6]
s_f_frame = pd.DataFrame(np.stack([group_membership_data,g_2],axis=1),columns=['SX 0','SX 1'])
metric_2_f = MetricFrame(metrics=skm.recall_score,y_pred=y_pred,y_true=y_true,sensitive_features=s_f_frame)
print(metric_2_f.overall)
print(metric_2_f.by_group)
The output is as follows:
In the above code, the only thing we need to understand is np.stack()
the method , which is used array()
to merge two same dimensions in a certain dimension** "axis = 0 according to the x-axis, axis = 1 according to the y-axis". **
a = np.array([1,2,3])
b = np.array([4,5,6])
np.stack([a,b],axis=1)
The combined result is as follows:
1.3 Scalar Results from MetricFrame
Advanced machine learning algorithms often use metric functions to guide their optimization. Such algorithms usually work with scalar results, so if we wish to tune on fairness metrics, we need to perform aggregations on the MetricFrame.
In this section, we fairlearn
use make_derived_metric
the method provided to customize metric
. make_derived_metric
By metric
customizing based on four basic , they are group_min
, group_max
, difference
and ratio
"choose one of four" .
from fairlearn.metrics import make_derived_metric
fbeta_difference = make_derived_metric(metric=skm.fbeta_score,transform='difference') # 选择的 Base Metric 是 differebce,即最大 F_Beta 与最小 F_Beta 的差
fbeta_difference(y_true,y_pred,beta=0.7,sensitive_features = group_membership_data)
The output is as follows:
The above method can be replaced by the following code, the difference is that the latter needs to usefunctools.partial
fbeta_07 = functools.partial(skm.fbeta_score, beta=0.7)
fbeta_07_metrics = MetricFrame(metrics=fbeta_07, y_true=y_true, y_pred=y_pred, sensitive_features=group_membership_data)
fbeta_07_metrics.difference()
I personally feel that the following method is better, because it can easily call other methods.
1.4 Control features for grouped metrics
Controlling attributes, also known as conditional attributes, enable more detailed fairness insights by providing a further means of dividing data into subgroups. Controlling attributes act like sensitive attributes when data is divided into subgroups. The difference is that overall
the value is carried out on the individual subgroups divided by the control attribute **“other methods such as group_max
, group_min
also”**.
Controlling attributes is useful for some expected variation of attributes, so we need to compute the difference while controlling for attributes. For example, in a loan scenario, we want people with different incomes to be approved at different interest rates. However, within each income bracket, we still want to measure differences between different sensitive characteristics.
In one of the constructors of MetricFrame, control_features
the parameter .
decision = [
0,0,0,1,1,0,1,1,0,1,
0,1,0,1,0,1,0,1,0,1,
0,1,1,0,1,1,1,1,1,0
]
prediction = [
1,1,0,1,1,0,1,0,1,0,
1,0,1,0,1,1,1,0,0,0,
1,1,1,0,0,1,1,0,0,1
]
control_feature = [
'H','L','H','L','H','L','L','H','H','L',
'L','H','H','L','L','H','L','L','H','H',
'L','H','L','L','H','H','L','L','H','L'
]
sensitive_feature = [
'A','B','B','C','C','B','A','A','B','A',
'C','B','C','A','C','C','B','B','C','A',
'B','B','C','A','B','A','B','B','A','A'
]
metric_c_f = MetricFrame(metrics=skm.accuracy_score,
y_true=decision,
y_pred=prediction,
sensitive_features={
'SF' : sensitive_feature},
control_features={
'CF' : control_feature})
print(metric_c_f.overall)
print(metric_c_f.by_group)
The result is as follows:
This is my understanding of the control attribute. When we need to compare the differences between different groups, we use sensitive attributes, such as the income level of people in different regions. But if we want to look at different subgroups within the group, such as the income levels of men and women in a certain region, we can use the control attribute.
1.5 Draw group index map
1.5.1 Basic drawing method
Drawing mainly uses plot
the method , which internally callsmatplotlib
from fairlearn.metrics import false_positive_rate, true_positive_rate, selection_rate
from sklearn.metrics import accuracy_score, recall_score, precision_score
metrics = {
'accuracy': accuracy_score,
'precision': precision_score,
'recall': recall_score,
'false positive rate': false_positive_rate,
'true positive rate': true_positive_rate,
'selection rate': selection_rate,
'count': count}
metric_frame = MetricFrame(metrics=metrics,
y_true=y_true,
y_pred=y_pred,
sensitive_features=group_membership_data)
metric_frame.by_group.plot.bar(
subplots=True,
layout=[3, 3],
legend=False,
figsize=[12, 8],
title="Show all metrics",
)
1.5.2 Modify the unified ylim
In the figure in the previous section, we can see that the ranges of y
the axes are different. If we need to make their ranges consistent (to facilitate comparison), we can use the parametersplot()
specified in the method to adjust.ylim
metric_frame.by_group.plot.bar(
subplots=True,
ylim=[0,1],
layout=[3, 3],
legend=False,
figsize=[12, 8],
title="Show all metrics",
)
The result is as follows:
1.5.3 Modify the color system of the drawing
At this point, some people may want to complain about the underworld color scheme selected in the picture above! ! Can't you choose a good-looking color painting by yourself?
This point matplotlib
has been considered in the design, and we can adjust it through colormap
parameters . For more good-looking colors, please refer to here: Good-looking colors
Here we choose accent
the color system
metric_frame.by_group.plot.bar(
subplots=True,
ylim=[0,1],
layout=[3, 3],
legend=False,
figsize=[12, 8],
colormap = 'Accent',
title="Show all metrics",
)
The result is as follows:
1.5.4 Define the drawing style
At this point, my friends may have to ask again, all you draw are histograms, it’s too rubbish to only draw histograms! !
Alas~~, as a professional machine learning fairness library, fairlearn
how could this not be considered? **"In fact, it is considered by Pandas"**We can adjust the type of drawing through kind
parameters
metric_frame.by_group.plot(
kind = 'pie',
subplots=True,
ylim=[0,1],
layout=[3, 3],
legend=False,
figsize=[12, 8],
colormap = 'Accent',
title="Show all metrics",
)
The result is as follows:
So how do we know what graphics can be drawn? You can refer to the following link: Drawable type