Explaining Black-Box Algorithms Using Probabilistic
Explaining Black-Box Algorithms Using Probabilistic
C ONTRASTIVE C OUNTERFACTUALS
A P REPRINT
Babak Salimi
University of California, San Diego
bsalimi@ucsd.edu
A BSTRACT
There has been a recent resurgence of interest in explainable artificial intelligence (XAI) that aims
to reduce the opaqueness of AI-based decision-making systems, allowing humans to scrunitize and
trust them. Prior work in this context has focused on the attribution of responsibility for an al-
gorithm’s decisions to its inputs wherein responsibility is typically approached as a purely asso-
ciational concept. In this paper, we propose a principled causality-based approach for explaining
black-box decision-making systems that addresses limitations of existing methods in XAI. At the
core of our framework lies probabilistic contrastive counterfactuals, a concept that can be traced
back to philosophical, cognitive, and social foundations of theories on how humans generate and se-
lect explanations. We show how such counterfactuals can quantify the direct and indirect influences
of a variable on decisions made by an algorithm, and provide actionable recourse for individuals
negatively affected by the algorithm’s decision. Unlike prior work, our system, L EWIS: (1) can
compute provably effective explanations and recourse at local, global and contextual levels; (2) is
designed to work with users with varying levels of background knowledge of the underlying causal
model; and (3) makes no assumptions about the internals of an algorithmic system except for the
availability of its input-output data. We empirically evaluate L EWIS on three real-world datasets
and show that it generates human-understandable explanations that improve upon state-of-the-art
approaches in XAI, including the popular LIME and SHAP. Experiments on synthetic data further
demonstrate the correctness of L EWIS’s explanations and the scalability of its recourse algorithm.
1 Introduction
Algorithmic decision-making systems are increasingly used to automate consequential decisions, such as lending,
assessing job applications, informing release on parole, and prescribing life-altering medications. There is growing
concern that the opacity of these systems can inflict harm to stakeholders distributed across different segments of
society. These calls for transparency created a resurgence of interest in explainable artificial intelligence (XAI), which
aims to provide human-understandable explanations of outcomes or processes of algorithmic decision-making systems
(see [33, 61, 62] for recent surveys).
Effective explanations should serve the following purposes: (1) help to build trust by providing a mechanism for
normative evaluation of an algorithmic system, ensuring different stakeholders that the system’s decision rules are
justifiable [82]; and (2) provide users with an actionable recourse to change the results of algorithms in the future [8,
96, 95]. Existing methods in XAI can be broadly categorized based on whether explainability is achieved by design
∗
These authors contributed equally to this work.
1 INTRODUCTION 2
Purpose Repairs
Age Adult
Sufficiency
Maeve Credit amount 1,275 DM
Month 10 months score
Savings < 100 DM
Credit history Existing paid duly
Recommended Recourse: A decline in credit history is most likely to
Actionable Current Required change a positive decision into negative.
Attributes Value Value This recourse will lead
Purpose Repairs Furniture to a positive decision.
Credit amount 1,275 DM 3,000 – 5,000 DM
Savings < 100 DM 500 – 1,000 DM
Your loan would have been rejected with 53% probability Increasing status of checking account is
were Credit history = ‘Delay in paying off in the past’. more likely to flip a negative decision for
Sex=Male than for Sex=Female.
Figure 1: An overview of explanations generated by L EWIS for a loan approval algorithm built using the UCI
German credit dataset (see Section 5 for details). Given a black-box classification algorithm, L EWIS generates:
(a) Local explanations, that explain the algorithm’s output for an individual; (b) Global explanations, that
explain the algorithm’s behavior across different attributes; and (c) Contextual explanations, that explain the
algorithms’s predictions for a sub-population of individiuals.
(intrinsic) or by post factum system analysis (post hoc), and whether the methods assume access to system internals
(model dependent) or can be applied to any black-box algorithmic system (model agnostic).
In this work, we address post hoc and model-agnostic explanation methods that are applicable to any proprietary black-
box algorithm. Prior work in this context has focused on the attribution of responsibility of an algorithm’s decisions to
its inputs. These approaches include methods for quantifying the global (population-level) or local (individual-level)
influence of an algorithm’s input on its output [23, 28, 5, 35, 31, 22, 54, 53, 14]; they also include methods based on
surrogate explainability, which search for a simple and interpretable model (such as a decision tree or a linear model)
that mimics the behaviour of a black-box algorithm [72, 73]. However, these methods can produce incorrect and
misleading explanations primarily because they focus on the correlation between the input and output of algorithms as
opposed to their causal relationship [36, 44, 24, 33, 62, 4]. Furthermore, several recent works have argued for the use
of counterfactual explanations, which are typically obtained by considering the smallest perturbation in an algorithm’s
input that can lead to the algorithm’s desired outcome [96, 47, 92, 56, 64]. However, due to the causal dependency
between variables, these perturbations are not translatable into real-world interventions and therefore fail to generate
insights that are actionable in the real world [7, 40, 85, 56, 41].
This paper describes a new causality-based framework for generating post-hoc explanations for black-box decision-
making algorithms that unifies existing methods in XAI and addresses their limitations. Our system, L EWIS, 2
reconciles the aforementioned objectives of XAI by: (1) providing insights into what causes an algorithm’s decisions
at the global, local and contextual (sub-population) levels, and (2) generating actionable recourse translatable into
real-world interventions. At the heart of our proposal are probabilistic contrastive counterfactuals of the following
2
Our system is named after David Lewis (1941–2001), who made significant contributions to modern theories of causality and
explanations in terms of counterfactuals. In his essay on causal explanation [48], Lewis argued that “to explain an event is to provide
some information about its causal history.” He further highlighted the role of counterfactual contrasts in explanations when he wrote,
“One way to indicate what sort of explanatory information is wanted is through the use of contrastive why-questions . . . [where]
information is requested about the difference between the actualized causal history of the explanandum and the unactualized causal
histories of its unactualized alternatives [(termed as “foils” by Peter Lipton [51])]. Why did I visit Melbourne in 1979, rather than
Oxford or Uppsala or Wellington?”
1 INTRODUCTION 3
form:
“For individual(s) with attribute(s) <actual-value> for whom an algorithm made
the decision <actual-outcome>, the decision would have been <foil-outcome> (1)
with probability <score> had the attribute been <counterfactual-value>.”
Contrastive counterfactuals are at the core of the philosophical, cognitive, and social foundations of theories that
address how humans generate and select explanations [15, 26, 67, 51, 97, 32, 63]. Their probabilistic interpretation
has been formalized and studied extensively in AI, biostatistics, political science, epistemology, biology and legal
reasoning [29, 75, 30, 87, 29, 74, 11, 67, 32, 57]. While their importance in achieving the objectives of XAI has
been recognized in the literature [60], very few attempts have been made to operationalize causality-based contrastive
counterfactuals for XAI. The following example illustrates how L EWIS employs contrastive counterfactuals to generate
different types of explanations.
Example 1.1. Consider the black-box loan-approval algorithm in Figure 1 for which L EWIS generates different
kinds of explanations. For local explanations, L EWIS ranks attributes in terms of their causal responsibility to the
algorithm’s decision. For individuals whose loans were rejected, the responsibility of an attribute is measured by its
sufficiency score, defined as “the probability that the algorithm’s decision would have been positive if that attribute
had a counterfactual value”. For Maeve, the sufficiency score of 28% for purpose of loan means that if purpose
were ‘Furniture’, Maeve’s loan would have been approved with a 28% probability. For individuals whose loans were
approved, the responsibility of an attribute is measured by its necessity score, defined as “the probability that the
algorithm’s decision would have been negative if that attribute had a counterfactual value.” For Irrfan, the necessity
score of 53% for credit history means that had credit history been worse, Irrfan would have been denied the loan with
a 53% probability. Furthermore, individuals with a negative decision, such as Maeve, would want to know the actions
they could take that would likely change the algorithm’s decision. For such users, L EWIS suggests the minimal causal
interventions on the set of actionable attributes that are sufficient, with high probability, to change the algorithm’s
decision in the future. Additionally, L EWIS generates insights about the algorithm’s global behavior with respect to
each attribute by computing its necessity, sufficiency, and necessity and sufficiency scores at the population level. For
instance, a higher necessity score for credit history indicates that a decline in its value is more likely to reverse a
positive decision than a lower value of savings; a lower sufficiency score for age indicates that increasing it is less
likely to overturn a negative decision compared to credit history or savings. By further customizing the scores for
a context or sub-population of individuals that share some attributes, L EWIS illuminates the contextual behavior of
the algorithm in different sub-populations. In Figure 1, L EWIS indicates that increasing the status is more likely to
reverse a negative decision for {sex=Male} than for {sex=Female}.
To compute these scores, L EWIS relies on the ordinal importance of attribute values e.g., higher savings are more
likely to be granted a loan than lower savings. In case the attribute values do not possess a natural ordering or the
ordering is not known apriori, L EWIS infers it from the output of the black-box algorithm (more in Section 4.1).
Our contributions. This paper proposes a principled approach for explaining black-box decision-making systems
using probabilistic contrastive counterfactuals. Key contributions include:
1. Adopting standard definitions of sufficient and necessary causation based on contrastive counterfactuals to propose
novel probabilistic measures, called necessity scores and sufficiency scores, which respectively quantify the
extent to which an attribute is necessary and sufficient for an algorithm’s decision (Section 3.1). We show that these
measures play unique, complementary roles in generating effective explanations for algorithmic systems. While the
necessity score addresses the attribution of causal responsibility of an algorithm’s decisions to an attribute, sufficient
score addresses the tendency of an attribute to produce the desired algorithmic outcome.
2. Demonstrating that our newly proposed measures can generate a wide range of explanations for algorithmic sys-
tems that quantify the necessity and sufficiency of attributes that implicitly or explicitly influence an algorithm’s
decision making process (Section 3.2). More importantly, L EWIS generates contextual explanations at global or
local levels and for a user-defined sub-population.
3. Showing that the problem of generating actionable recourse can be framed as an optimization problem that searches
for a minimal intervention on a pre-specified set of actionable variables that have a high sufficiency score for
producing the algorithm’s desired future outcome.
4. Establishing conditions under which the class of probabilistic contrastive counterfactuals we use can be
bounded and estimated using historical data (Section 4.1). Unlike previous attempts to generate actionable
recourse using counterfactual reasoning, L EWIS leverages established bounds and integer programming to generate
reliable recourse under partial background knowledge on the underlying causal models ( Section 4.2).
5. Comparing L EWIS to state-of-the-art methods in XAI (Sections 5 and 6). We present an end-to-end experimental
evaluation on both real and synthetic data. In real datasets, we show that L EWIS generates intuitive and actionable
2 PRELIMINARIES 4
Symbol Meaning
X, Y, Z attributes (variables)
X, Y, Z sets of attributes
Dom(X), Dom(X) their domains
x ∈ Dom(X) an attribute value
x ∈ Dom(X) a tuple of attribute values
k ∈ Dom(K) a tuple of context attribute values
G causal diagram
hM, Pr(u)i probabilistic causal model
OX←x potential outcome
Pr(V = v), Pr(v) joint probability distribution
Pr(oX←x ) abbreviates Pr(OX←x = o)
Table 1: Notation used in this paper.
explanations that are consistent with insights from existing literature and surpass state-of-the-art methods in XAI.
Evaluation on synthetic data demonstrates the accuracy and correctness of the explanation scores and actionable
recourse that L EWIS generates.
2 Preliminaries
The notation we use in this paper is summarized in Table 1. We denote variables by uppercase letters, X, Y, Z, V ;
their values with lowercase letters, x, y, z, v; and sets of variables or values using boldface
Q (X or x). The domain
of a variable X is Dom(X), and the domain of a set of variables is Dom(X) = X∈X Dom(X). All domains
are discrete and finite; continuous domains are assumed to be binned. We use Pr(x) to represent a joint probability
distribution Pr(X = x). The basic semantic framework of our proposal rests on probabilistic causal models [67],
which we review next.
Probabilistic causal models. A probabilistic causal model (PCM) is a tuple hM, Pr(u)i, where M = hU, V, Fi is
a causal model consisting of a set of observable or endogenous variables V and a set of background or exogenous
variables U that are outside of the model, and F = (FX )X∈V is a set of structural equations of the form FX :
Dom(PaV (X)) × Dom(PaU (X)) → Dom(X), where PaU (X) ⊆ U and PaV (X) ⊆ V − {X} are called
exogenous parents and endogenous parents of X, respectively. The values of U are drawn from the distribution
Pr(u). A PCM hM, Pr(u)i can be represented as a directed graph G = hV, Ei, called a causal diagram, where each
node represents a variable, and there are directed edges from the elements of PaU (X) ∪ PaV (X) to X. We say a
variable Z is a descendant of another variable X if Z is caused (either directly or indirectly) by X, i.e., if there is a
directed edge or path from X to Z in G; otherwise, we say that Z is a non-descendant of X.
Interventions and potential outcomes. An intervention or an action on a set of variables X ⊆ V, denoted X ← x,
is an operation that modifies the underlying causal model by replacing the structural equations associated with X
with a constant x ∈ Dom(X). The potential outcome of a variable Y after the intervention X ← x in a context
u ∈ Dom(U), denoted YX←x (u), is the solution to Y in the modified set of structural equations. Potential outcomes
satisfy the following consistency rule used in derivations presented in Section 4.1.
X(u) = x =⇒ YX←x (u) = y (2)
This rule states that in contexts where X = x, the outcome is invariant to the intervention X ← x. For example,
changing the income-level of applicants to high does not change the loan decisions for those who already had high-
incomes before the intervention.
The distribution Pr(u) induces a probability distribution over endogenous variables and potential outcomes. Using
PCMs, one can express counterfactual queries of the form Pr(YX←x = y | k), or simply Pr(yX←x | k); this reads as
“For contexts with attributes k, what is the probability that we would observe Y = y had X been x?” and is given by
the following expression:
X
Pr(yX←x | k) = Pr(yX←x (u)) Pr(u | k) (3)
u
Equation (3) readily suggests Pearl’s three-step procedure for answering counterfactual queries [67][Chapter 7]: (1)
update Pr(u) to obtain Pr(u | k) (abduction), (2) modify the causal model to reflect the intervention X ← x (action),
and (3) evaluate the RHS of (3) using the index function Pr(YX←x (u) = y) (prediction). However, performing this
procedure requires the underlying PCM to be fully observed, i.e, the distribution Pr(u) and the underlying structural
3 EXPLANATIONS AND RECOURSE USING PROBABILISTIC COUNTERFACTUALS 5
D
G
R O
equations must be known, which is an impractical requirement. In this paper, we assume that only background knowl-
edge of the underlying causal diagram is available, but exogenous variables and structural equations are unknown.
The do-operator. For causal diagrams, Pearl defined the do-operator as a graphical operation that gives semantics to
interventional queries of the form “What is the probability that we would observe Y = y (at population-level) had
X been x?”, denoted Pr(y | do(x)). Further, he proved a set of necessary and sufficient conditions under which
interventional queries can be answered using historical data. A sufficient condition is the backdoor-criterion, 3 which
states that if there exists a set of variables C that satisfy a graphical condition relative to X and Y in the causal diagram
G, the following holds (see [67][Chapter 3] for details):
X
Pr(y | do(x)) = Pr(y | c, x) Pr(c) (4)
c∈Dom(C)
In contrast to (3), notice that the RHS of (4) is expressed in terms of observed probabilities and can be estimated from
historical data using existing statistical and ML algorithms.
Counterfactuals vs. interventional queries. The do-operator is a population-level operator, meaning it can only
express queries about the effect of an intervention at population level; in contrast, counterfactuals can express queries
about the effect of an intervention on a sub-population or an individual. Therefore, every interventional query can
be expressed in terms of counterfactuals, but not vice versa (see [70][Chapter 4] for more details). For instance,
Pr(y | do(x)) = Pr(yX←x ); however, the counterfactual query Pr(yX←x | x0 , y 0 ), which asks about the effect of the
intervention X ← x on a sub-population with attributes x0 and y 0 , cannot be expressed in terms of the do-operator
(see Example 2.1 below). Note that the probabilistic contrastive counterfactual statements in (1), used throughout
this paper to explain a black-box decision-making system concerned with the effect of interventions at sub-population
and individual levels, cannot be expressed using the do-operator and therefore cannot be assessed in general when the
underlying probabilistic causal models are not fully observed. Nevertheless, in Section 4.1 we establish conditions
under which these counterfactuals can be estimated or bounded using data.
Example 2.1. Continuing Example 1.1, Figure 2 represents a simple causal diagram for the loan application domain,
where G corresponds to the attribute gender, A to age, D to the repayment duration in months, O to the decision of
a loan application, and R compactly represents the rest of the attributes, e.g., status of checking account, employ-
ment, savings, etc. Note that the loan decision is binary: O = 1 and O = 0 indicate whether the loan has been
approved or not, respectively. The interventional query Pr(O = 1 | do(D = 24 months)) that is equivalent to the
counterfactual Pr(OD←24 months = 1) reads as “What is the probability of loan approval at population-level had all
applicants selected repayment duration of 24 months?” This query can be answered using data and the causal diagram
(since {G, A} satisfies the backdoor-criterion in the causal diagram in Figure 2). However, the counterfactual query
Pr(OD←24 months = 1 | O = 0), which reads as ‘What is the probability of loan approval for a group of applicants
whose loan applications were denied had they selected a repayment duration of 24 months?”, cannot be expressed
using the do-operator.
We are given a decision-making algorithm f : Dom(I) → Dom(O), where I is set of input attributes (a.k.a. features
for ML algorithms) and O is a binary attribute, where O = o denotes the positive decision (loan approved) and O = o0
3
Since it is not needed in our work, we do not discuss the graph-theoretic notion of backdoor-criterion.
3.2 L EWIS’s Explanations 6
denotes the negative decision (loan denied). Let us assume we are given a PCM hM, Pr(u)i with a corresponding
causal diagram G (this assumption will be relaxed in Section 4.1) such that I ⊆ V, i.e., the inputs of f are a subset
of the observed attributes. Consider an attribute X ∈ V and a pair of attribute values x, x0 ∈ Dom(X). We quantify
the influence of the attribute value x relative to a baseline x0 on decisions made by an algorithm using the following
scores, herein referred to as explanation scores; (we implicitly assume an order x > x0 ).
Definition 3.1 (Explanation Scores). Given a PCM hM, Pr(u)i and an algorithm f : Dom(X) → Dom(O), a
variable X ∈ V, and a pair of attribute values x, x0 ∈ Dom(X), we quantify the influence of x relative to x0 on the
algorithm’s decisions in the context k ∈ Dom(K), where K ⊆ V − {X, O}, using the following measures:
where the distribution Pr(oX←x ) is well-defined and can be computed from the algorithm f (I).4
0 0 0
For simplicity of notation, we drop x0 from N ECxx , S UFxx and N E S UFxx whenever it is clear from the context.
The necessity score in (5) formalizes the probabilistic contrastive counterfactual in (1), where <actual-value> and
<counterfactual-value> are respectively k ∪ x and k ∪ x0 , and <actual-decision> and <foil-decision> are
respectively positive decision o and negative decision o0 . This reads as “What is the probability that for individuals
with attributes k, the algorithm’s decision would be negative instead of positive had X been x0 instead of x?” In
other words, N ECX (.) measures the algorithm’s percentage of positive decisions that are attributable to or due to
the attribute value x. The sufficiency score in (6) is the dual of the necessity score; it reads as “What would be the
probability that for individuals with attributes k, the algorithm’s decision would be positive instead of negative had X
been x instead of x0 ?” Finally, the necessity and sufficiency score in (7) establishes a balance between necessary and
sufficiency; it measures the probability that the algorithm responds in both ways. Hence, it can be used to measure the
general explanatory power of an attribute. In Section 4.1, we show that the necessary and sufficiency score is non-zero
iff X causally influences the algorithm’s decisions. (Note that the explanation scores are well-defined for a set of
attributes.)
Remark 3.2. A major difference between our proposal and existing methods in XAI is the ability to account for the
indirect influence of attributes that may not be explicitly used in an algorithm’s decision making process, but implicitly
influence its decisions via their proxies. The ability to account for such influences is particularly important in auditing
algorithms for fairness, where typically sensitive attributes, such as race or gender, are not explicitly used as input to
algorithms. For instance, in [93] Wall Street Journal investigators reported that a seemingly innocent online pricing
algorithm that simply adjusts online prices based on users’ proximity to competitors’ stores is discriminative against
lower-income individuals. In this case, the algorithm does not explicitly use income; however, it turns out that living
further from competitors’ stores is a proxy for low income. 5
Based on the explanations scores proposed in Section 3.1, L EWIS generates the following types of explanations.
Global, local and contextual explanations. To understand the influence of each variable X ∈ V on an algorithm’s
decision, L EWIS computes the necessity score N ECx (k), sufficiency score S UFx (k), and necessity and sufficiency
score N E S UFx (k) for each value x ∈ Dom(X) in the following contexts: (1) K = ∅: the scores measure the global
influence of X on the algorithm’s decision. (2) K = V: the scores measure the individual-level or local influence
of X on the algorithm’s decision. (3) A user-defined K = k with ∅ ( K ( V: the scores measure the contextual
influence of X on the algorithm’s decision. In the context k, L EWIS calculates the explanation scores for an attribute
4
For deterministic f (I), Pr(oX←x ) = i∈Dom(I) 1{f (i)=o} Pr(IX←x = i), where 1{f (i)=o} is an indicator function.
P
5
In contrast to mediational analysis in causal inference that studies direct and indirect causal effects [66, 69], in this paper we
are interested in quantifying the sufficiency and necessity scores of attributes explicitly and implicitly used by the algorithm.
4 PROPERTIES AND ALGORITHMS 7
X by computing the maximum score over all pairs of attribute values x, x0 ∈ Dom(X). In addition to singleton
variables, L EWIS can calculate explanation scores for any user-defined set of attributes.
For a given individual, L EWIS estimates the positive and negative contributions of a specific attribute value toward the
outcome. Consider an individual with a negative outcome O = o0 having the attribute X = x0 . The negative contribu-
0
tion of x0 is characterized by the probability of getting a positive outcome on intervening X ← x, max0 S UFxx (k), and
x>x
00
0
the positive contribution of x for the individual is calculated as max S UFxx0 (k). Similarly, for an individual with a
x00 <x0
0 0
positive outcome O = o attribute value X = x , the positive contribution of x is calculated by estimating the proba-
00
bility of o0 if the attribute value was intervened to be smaller than x0 , max
00 0
N ECxx0 (k) and the negative contribution of
x <x
0
X = x0 is max0 N ECxx (k). Note that the negative contribution of attribute X = x0 is calculated by intervening on the
x>x
individual at hand, but the positive contribution is estimated by intervening on individuals with X = x00 to satisfy the
same context k. In Figure 1, low credit amount contributes negatively to the outcome for Maeve as increasing credit
amount improves their chances of getting the loan approved. Attributes like credit history contribute both positively
and negatively: poor credit history worsens the chances of approval, but improving credit history furthers the chances
of better credit.
Counterfactual recourse. For individuals for whom an algorithm’s decision is negative, L EWIS generates expla-
nations in terms of minimal interventions on a user-specified set of actionable variables A ⊆ V that have a high
sufficiency score, i.e., the intervention can produce the positive decision with high probability. The explanations can
be used either as justification in case the decision is challenged or as a feasible action that the individual may perform
in order to improve the outcome in the future (“recourse”). For example, in Figure 1, the set of actionable items for
Maeve may consist of her credit amount, loan duration, savings and purpose. Examples of specific actions include
“increase the loan repayment duration” or “raise the amount in savings.”
Given an individual with attributes v, a set of actionable variables A ⊆ V, and a cost function Cost (a, â) that
determines the cost of an intervention that changes A from its current value a to â, for â ∈ Dom(A), a counterfactual
recourse can be computed using the following optimization problem:
The optimization problem in (8) treats the decision-making algorithm as a black box; hence, it can be solved merely
using historical data (see Section 4.2). The solutions to this problem provide end-users with informative, feasible
and actionable explanations and recourse by answering questions such as “What are the best courses of action that, if
performed in the real world, would with high probability change the outcome for this individual?”
In this section, we study properties of the explanation scores in Section 3 and establish conditions under which they can
be bounded or estimated from historical data (Section 4.1). We then develop an algorithm for solving the optimization
problem for computing counterfactual recourse (Section 4.2).
Recall from Section 2 that if the underlying PCM is fully specified, i.e., the structural equations and the exogenous
variables are observed, then counterfactual queries, and hence the explanation scores, can be computed via Equa-
tion (3). However, in many applications, PCMs are not fully observed, and one must estimate explanation scores from
data. First, we prove the following bounds on explanation scores, computed for a set of attributes X.
Proposition 4.1. Given a PCM hM, Pr(u)i with a corresponding causal DAG G, an algorithm f : Dom(I) →
Dom(O), and a set of attributes X ⊆ V − {O} with two sets of attribute values x, x0 ∈ Dom(X), if K consists of
4.1 Computing Explanation Scores 8
Pr(o0 , x | k)+Pr(o0 , x0 | k)
Pr(o | do(x), k)
− Pr(o0 | do(x), k)
≤S UFx (k) ≤ min − Pr(o, x | k)
max
0, 0 0
, 1 (10)
Pr(o , x | k) Pr(o0 , x0 | k)
Proof. We prove the bounds for (9); (10) and (11) are proved similarly. The following equations are obtained from
the law of total probability:
Pr(o0X←x , x, k) = Pr(o0X←x , o0X←x0 , x, k) + Pr(o0X←x , oX←x0 , x, k) (12)
Pr(o0X←x0 , x, k) = Pr(o0X←x0 , o0X←x , x, k) + Pr(o0X←x0 , oX←x , x, k) (13)
X
Pr(o0X←x0 , k) = Pr(o0X←x0 , x, k) + Pr(o0X←x0 , x0 , k) + Pr(o0X←x0 , x00 , k)
x00 ∈Dom(X)−{x,x0 }
(14)
By rearranging (12) and (13), we obtain the following equality:
Pr(o0X←x0 , oX←x , x, k) = Pr(o0X←x0 , x, k) − Pr(o0X←x , x, k) + Pr(o0X←x , oX←x0 , x, k) (15)
The following bounds for the LHS of (15) are obtained from the Fréchet bound.6
X
LHS ≥ Pr(o0X←x0 , k) − Pr(o0 , x0 , k) − Pr(o0X←x , x, k) − Pr(o0X←x0 , x00 , k)
x00 ∈Dom(X)−{x,x0 }
(obtained from Eq. (14) and (2), lower bounding Pr(o0X←x , oX←x0 , x, k))
≥ Pr(o0X←x0 , k) − Pr(o , x, k) − Pr(o , x , k) − Pr(k) + Pr(x, k) + Pr(x0 , k)
0 0 0
(obtained from Eq. (14) and (2), upper bounding Pr(o0X←x , oX←x0 , x, k))
≤ Pr(o0X←x0 , k) − Pr(o0 , x0 , k) (18)
Equation (9) is obtained by dividing (16), (17) and (18) by Pr(o, x, k), applying the consistency rule (2), and con-
sidering the fact that since K consists of non-descendants of X, the intervention X ← x0 does not change K; hence,
Pr(oX←x0 | k) = Pr(o | do(x0 ), k).
Proposition 4.1 shows the explanation scores can be bounded whenever interventional queries of the form Pr(o |
do(x), k) can be estimated from historical data using the underlying causal diagram G (cf. Section 2). The next
proposition further shows that if the algorithm is monotone relative to x, x0 ∈ Dom(X), i.e., if x > x0 , then OX←x ≥
OX←x0 7 , and the exact value of the explanation scores can be computed from data. (In case the ordering between x
and x0 is not known apriori (e.g., for categorical values), we infer it by comparing the output of the algorithm for x
and x0 .)
6 P
max 0, x∈X Pr(x) − (|x| − 1) ≤ Pr(x) ≤ minx∈x Pr(x)
7
Monotonicity expresses the assumption that changing X from x0 to x cannot change the algorithm’s decision from positive to
negative; increasing X always helps.
4.1 Computing Explanation Scores 9
Proposition 4.2. Given a causal diagram G, if the decision-making algorithm f : Dom(I) → Dom(O) is monotone
relative to x, x0 ∈ Dom(X) and if there exists a set of variables C ⊆ V − {K ∪ X} such that C ∪ K satisfies the
backdoor-criterion relative to X and I in G, the following holds:
P
c∈Dom(C) Pr(o0 | c, x0 , k) Pr(c | x, k) − Pr(o0 | x, k)
N ECx (k) = (19)
Pr(o | x, k)
P
c∈Dom(C) Pr(o | c, x, k)Pr(c | x0 , k) − Pr(o | x0 , k)
S UFx (k) = (20)
Pr(o0 | x0 , k)
X
Pr(o | x, k, c) − Pr(o | x0 , c, k) Pr(c | k)
N E S UFx (k) = (21)
c∈Dom(C)
Proof. Here, we only prove (19). The proof of (20) and (21) are similar. Notice that monotonicity implies
Pr(o0X←x , oX←x0 , x, k) = 0. Also note that if C ∪ K satisfies the backdoor-criterion, then the following inde-
pendence, known as conditional ignorability, holds: (OX←x ⊥⊥X | C ∪ K)[70][Theorem 4.3.1]. We show (19) in the
following steps:
Pr(o0X←x0 , x, o, k) Pr(o0X←x0 | x, k) − Pr(o0 | x, k)
N ECx (k) = = (22)
Pr(o, x, k) Pr(o | x, k)
(from Eq. (15), (2) and monotonicity)
0 0
P
c∈Dom(C) Pr(o X←x0 | c, x, k) Pr(c | x, k) − Pr(o | x, k)
=
Pr(o | x, k)
0 0 0
P
c∈Dom(C) Pr(o | c, x , k) Pr(c | x, k) − Pr(o | x, k)
=
Pr(o | x, k)
(from conditional ignorability and Eq. (2))
Therefore, Proposition 4.2 facilitates bounding and estimating explanation scores from historical data when the under-
lying probabilistic causal models are not fully specified but background knowledge on the causal diagram is available.
(See Section 6 for a discussion about the absence of causal diagrams). We establish the following connection between
the explanation scores.
Proposition 4.3. Explanation scores are related through the following inequality. For a binary X, the inequality
becomes an equality.
N E S UFx (k) ≤ Pr(o, x | k) N ECx (k) + Pr(o0 , x0 | k) S UFx (k) + 1 − Pr(x | k) − Pr(x0 | k) (23)
Proof. The inequality is obtained from the law of total probability, the consistency rule in (2), and applying the Fréchet
bound, as shown in the following steps:
Pr(oX←x , o0X←x0 , k)
N E S UFx (k) = (from consistency (2))
Pr(k)
1 X X
Pr(oX←x , o0X←x , k, x) + Pr(oX←x , o0X←x0 , k, x00 )
=
Pr(k)
x∈{x,x0 } x00 ∈Dom(X)−{x,x0 }
Therefore, for binary attributes, the necessary and sufficiency score can be seen as the weighted sum of necessary and
sufficiency scores. Furthermore, the lower bound for the necessity and sufficiency score in Equations (11) is called the
4.2 Computing Counterfactual Recourse 10
(conditional) causal effect of X on O [68]. Hence, if the causal effect of X on the algorithm’s decision is non-zero,
then so is the necessity and sufficiency score (for a binary X, it is implied from (23) that at least one of the sufficiency
and necessity scores must also be non-zero). The following proposition shows the converse.
Proposition 4.4. Given a PCM hM, Pr(u)i with a corresponding causal DAG G, an algorithm f : Dom(Z) →
Dom(O) and an attribute X ∈ V, if O is a non-descendant of X, i.e., there is no causal path from X to O, then
for all (x, x0 ) ∈ Dom(X) and for all contexts k ∈ Dom(K), where K ⊆ V − {X, O} , it holds that N ECx (k) =
S UFx (k) = N E S UFx (k) = 0.
Proof. Let X be any non-descendant of O in the causal diagram G. Since the potential outcomes are invariant to
interventions, the implication from Equation (3) is that for all (x, x0 ) ∈ Dom(X) and for any set of attributes, e:
Pr(oX←x0 | e) = Pr(oX←x | e). This equality and consistency rule (2) implies that N ECx (k) = Pr(o0X←x0 |
x, o, k) = Pr(o0X←x | x, o, k) = Pr(o0 | x, o, k) = 0. N E S UFx (k) = 0 and S UFx (k) = 0 can be proved similarly.
Extensions to multi-class classification and regression. For multi-valued outcomes, i.e., Dom(O) = {o1 , . . . , oγ },
we assume an ordering of the values o1 > . . . > oγ such that oi > oj implies that oi is more desirable than oj . This
assumption holds in tasks where certain outcomes are favored over others and holds for real-valued outcomes that have
a natural ordering of values. We partition Dom(O) into sets O< and O≥ where O< denotes the set of values less than
o and O≥ denotes the set of values greater than o. Note that we do not require a strict ordering of the values and can
simply partition them as favorable and unfavorable. In these settings, we redefine the explanation scores with respect
to each outcome value o. For example, necessity score is defined as the probability that the outcome O changes from
a value greater than or equal to o to a value lower than o upon the intervention X ← x0 :
0 def ≥
N ECxx (k, o) = Pr(OX←x
<
0 | x, O , k)
The other two scores can be extended in a similar fashion. Our propositions extend to these settings and can be
directly used to evaluate the explanation scores using observational data.
5 Experiments
This section presents experiments that evaluate the effectiveness of L EWIS. We answer the following questions. Q1:
What is the end-to-end performance of L EWIS in terms of gaining insight into black-box machine learning algorithms?
How does the performance of L EWIS change with varying machine learning algorithms? Q2: How does L EWIS
compare to state-of-the-art methods in XAI? Q3: To what extent are the explanations and recourse options generated
by L EWIS correct?
5.1 Datasets
We considered four black-box machine learning algorithms: random forest classifier [90], random forest regres-
sion [90], XGBoost [91], and a feed forward neural network [89]. We used the causal diagrams presented in [10]
for the Adult and German datasets and [65] for COMPAS. For the Drug dataset, the attributes Country, Age, Gen-
der and Ethnicity are root nodes that affect the outcome and other attributes; the outcome is also affected by the other
attributes. We implemented our scores and the recourse algorithm in Python. We split each dataset into training and
test data, learned a black-box algorithm (random forest classifier unless stated otherwise) over training data, and esti-
mated conditional probabilities in (19)-(21) by regressing over test data predictions. We report the explanation scores
for each dataset under different scenarios. To present local explanations , we report the positive and negative contri-
butions of an attribute value toward the current outcome (e.g., in Figure 7, bars to the left (right) represent negative
(positive) contributions of an attribute value). To recommend recourse to individuals who receive a negative decision,
we generate a set of actions with the minimal cost that, if taken, can change the algorithm’s decision for the individual
in the future with a user-defined probability threshold α.
5.3 End-to-End Performance
In the following experiments, we present the local, contextual and global explanations and recourse options generated
by L EWIS. In the absence of ground truth, we discuss the coherence of our results with intuitions from existing
literature. Table 2 reports the running time of L EWIS for computing explanations and recourse.
5.3 End-to-End Performance 12
German. Consider the attributes status and credit history in Figure 3a. Their near-perfect sufficiency scores
indicate their high importance toward a positive outcome at the population level. For individuals for whom the algo-
rithm generated a negative outcome, an increase in their credit history or maintaining above the recommended daily
minimum in checking accounts (status) is more likely to result in a positive decision compared to other attributes
such as housing or age. These scores and the low necessity scores of attributes are aligned with our intuition about
good credit risks: (a) good credit history and continued good status of checking accounts add to the credibility of
individuals in repaying their loans, and (b) multiple attributes favor good credit and a reduction in any single attribute
is less likely to overturn the decision.
In Figure 4a, we present contextual explanations that capture the effect of intervening on status in different age
groups. We observe that increasing in the status of checking account from < 0 DM to > 200 DM is more likely to
reverse the algorithm’s decision for older individuals (as seen in their higher sufficiency score compared to younger
individuals). This behavior can be attributed to the fact that along with checking account status, loan-granting decisions
depend on variables such as credit history, which typically has good standing for the average older individual. For
younger individuals, even though the status of their checking accounts may have been raised, their loans might still be
rejected due to the lack of a credible credit history.
We report the local explanations generated by L EWIS in Figure 5. In the real world, younger individuals and individuals
with inadequate employment experience or insufficient daily minimum amount in checking accounts are less likely
to be considered good credit risks. This observation is evidenced in the negative contribution of status, age and
employment for the negative outcome example. For the positive outcome example, current attribute values contribute
toward the favorable outcome. Since increasing any of them is unlikely to further improve the outcome, the values do
not have a negative contribution. Figure 1 presents an example actionable recourse scenario, suggesting an increase in
savings, credit amount and purpose improves credit risk.
Adult. Several studies [88, 98] have analyzed the impact of gender and age in this dataset. The dataset has been shown
to be inconsistent: income attributes for married individuals report household income, and there are more married
males in the dataset indicating a favorable bias toward males [77]. We, therefore, expect age to be a necessary cause
for higher income, but it may not be sufficient since increasing age does not imply that an individual is married. This
intuition is substantiated by the high necessity and low sufficiency scores of age in Figure 3b. Furthermore, as shown
in Figure 4b, changing marital status to a higher value has a greater effect on older than on younger individuals; this
effect can be attributed to the fact that compared to early-career individuals, mid-career individuals typically contribute
more to joint household income. Consequently, for an individual with a negative outcome (Figure 6), marital status and
age contribute toward the negative outcome. For an individual with a positive outcome, changing any attribute value
is less likely to improve the outcome. However, increasing working hours will further the favorable outcome with a
higher probability. We calculated the recourse for the individual with negative outcome and identified that increasing
the hours to more than 42 would result in a high-income prediction.
COMPAS. We compare the global explanation scores generated by L EWIS for the COMPAS software used in courts
(Figure 3c). The highest score of priors ct validates the insights of previous studies [1, 80] that the number of prior
crimes is one of the most important factors determining chances of recidivism. Figures 4c and 4d present the effect
5.3 End-to-End Performance 13
Attribute: Status Attribute: Marital 1.00 Attribute: Prior Count 1.00 Attribute: Juvenile Crime
1.00 Old 1.00 Young Old
0.75 0.75 Black
0.75 0.75 White Black Black
Scores
Scores
Scores
Scores
Young 0.50 White Black 0.50 White
0.50 0.50 Black White
Old White Black
Old
White
0.25 0.25 0.25 0.25
Old YoungOld
Young
Young Young
0.00 0.00 0.00 0.00
Nec Suf NeSuf Nec Suf NeSuf Nec Suf NeSuf Nec Suf NeSuf
a) Effect of status on different b) Effect of marital on differ- c) Effect of prior count on d) Effect of juvenile crime on
age groups. (German) ent age groups. (Adult) race. (COMPAS) race. (COMPAS)
Figure 4: L EWIS’s contextual explanations show the effect of intervening on an attribute over different sub-
populations.
Co-applicant 30
Debtors None Hours Other service 32
Own Own Occupation Prof-specialty
Housing < 0 DM Private
Status Salaried
Existing paid
Class 10th
Self-emp-not-inc
Existing paid Education Never married HS-grad
Credit history 1,275 DM 15,857 DM
Credit amt Marital Married-civ-spouse
Skilled Highly qualified USA USA
Skill level < 25 yr > 25 yr Country Male
Age Sex Male
Female Male 34 yr 63 yr
Sex Age
1.0 0.75 0.5 0.0 0.5 1.0 0.2 0.0 0.2 -0.5 0.0 0.5 1.0
Negative output example Positive output example Negative output example Positive output example
Figure 5: L EWIS’s local explanations. (German) Figure 6: L EWIS’s local explanations. (Adult)
of intervening on prior crime count and juvenile crime count, respectively, on the software score (note that for
these explanations, we use the prediction scores from the COMPAS software, not the classifier output). We observe
that both the attributes have a higher sufficiency for Black compared to White, indicating that an increase in prior
crimes and juvenile crimes is more detrimental for Blacks compared to Whites. A reduction in these crimes, on the
other hand, benefits Whites more than Blacks, thereby validating the inherent bias in COMPAS scores. We did not
perform recourse analysis as the attributes describe past crimes and, therefore, are not actionable.
Drug. This dataset has been studied to understand the variation in drug patterns across demographics and the ef-
fectiveness of various sensation measurement features toward predicting drug usage. Figure 3d compares the global
5 2
impulsive>0 6 sensation>1 2
12 12
5 6
6 9
conscientious<0 7 agreeableness<0 9
12 12
8 12
9 12
ethnicity=white 5 ethnicity=White 5
1 2
12 12
country=UK 11 country= 1
12 rest of the world 1 12
1 1
4 10
edu=masters 8 edu=left school 6
12 1
3 12
2 3
sex=female 3 sex=male 3
12 12
4 2
7 11
age=25-34 yrs 12 age:18-24 yrs 11
2 12
12 5
1 0 1 1 0 1
Normalized score Normalized score
Scores Scores
0.0 0.5 1.0 0.0 0.5 1.0
occup
4
1 country 5
1
1
edu
5
2 sex 8
2
2
marital 3
1 class 2
3
3
age
2
6 marital 7
4
4
3
hours 4
5
age 3
6
6
sex 7
6
occupation 5
1
8
country 7
8
SHAP edu 6
7
class
7
5 Feat hours 4 SHAP
8 Lewis 8
Lewis
scores with respect to the outcome that the drug was used atleast once in lifetime. Previous studies [21] have found
that consumption of the particular drug is common in certain countries, as substantiated by the high necessity and
sufficiency scores of country. Furthermore, intuitively, individuals with a higher level of education are more likely to
be aware of the effects of drug abuse and hence, less likely to indulge in its consumption. This intuition is supported
by the observation in Figure 7a: a higher education level contributes towards the negative drug consumption outcome,
and in Figure 7b: a lower education level contributes positively toward the drug consumption outcome. We observe
similar conclusions for the explanations with respect to a different outcome such as drug used in the last decade.
Generalizability of L EWIS to black-box algorithms. In Figure 8, we present the global explanations generated by
L EWIS for black-box algorithms that are harder to interpret and are likely to violate the monotonicity assumption, such
as XGBoost and feed forward neural networks, and report the necessity and sufficiency score for each classifier. For
ease in deploying neural networks, we conducted this set of experiments on Adult which is our largest dataset. We
observed that different classifiers rank attributes differently depending upon the attributes they deem important. For
example, the neural network learns class as the most important attribute. Since country and sex have a causal effect on
class, L EWIS ranks these three attributes higher than others (see Section 5.4 for a detailed interpretation of the results).
Key takeaways. (1) The explanations generated by L EWIS capture causal dependencies between attributes, and are
applicable to any black-box algorithm. (2) L EWIS has proved effective in determining attributes causally respon-
sible for a favorable outcome. (3) Its contextual explanations, that show the effect of particular interventions on
sub-populations, are aligned with previous studies. (4) The local explanations offer fine-grained insights into the con-
tribution of attribute values toward the outcome of an individual. (5) For individuals with an unfavorable outcome,
whenever applicable, L EWIS provides recourse in the form of actionable interventions.
We compared the global and local explanations generated by L EWIS to existing approaches used for interpreting
ML algorithms: SHAP [55], LIME [72] and feature importance (Feat) [9]. SHAP explains the difference between a
prediction and the global average prediction, LIME explains the difference from a local average prediction, and Feat
measures the increase in an algorithm’s prediction error after permutating an attribute’s values. LIME provides local
explanations, Feat provides global explanations and SHAP generates both global and local explanations. LIME and
SHAP provide marginal contribution of an attribute to classifier prediction and are not directly comparable to L EWIS’s
probabilistic scores. However, since all the methods measure the importance of attributes in classifier predictions, we
report the relative ranking of attributes generated on their normalized scores, and present a subset of attributes ranked
high by any of the methods. We report the maximum N E S UFx score of an attribute obtained by L EWIS on all of
its value pairs. We also compared the recourse generated by L EWIS with LinearIP. (We contacted the authors of
[41] but do not use it in evaluation since their technique does not work for categorical actionable variables). We used
open-source implementations of the respective techniques.
German. In Figure 9a, note that housing is ranked higher by L EWIS than by Feat and SHAP. The difference lies in
the data: housing=own is highly correlated with a positive outcome. However, due to a skewed distribution (there
are ∼ 10% of instances where housing=own), random permutations of housing do not generate new instances, and
5.4 Comparing L EWIS to Other Approaches 15
invest 3
3 SHAP
class 6
7 SHAP edu 10
8
8 SHAP
8 2
14 Feat country
8
8 Feat age_cat 3
ascore 9 Feat
housing 16
9 Lewis 8 Lewis 5
11
10 Lewis
month=10 12
7
7 debt=co-applicnt 3
3
19
hours=30 2
3
8
hours=32 2
4
1
6 15 3 2
2 1
occup=Other occup=Prof-spec
housing=own 16
18
12
investment=2% 7
2
3
8
8
3
8
4
2
15 15 7 7
class=Private 6 class=Self-emp-not 6
employ<1 yr 1 4
12
sex=Male 18
18
14
8
8
8
8
2 15 6 6
edu=10th 7 edu=Prof-school 8
4 6
age<25 yr 2
20
12
age>25 yr 20
10
12
8 1
2
4 15 marital=Never 11 8
marital=Married-civ
3
1
5
credhist=exist pd 2
5
12
credhist=exist pd 2
1
1
8
8
8
8 15 country=USA 8 country=USA 7
3 7
6 4 4
savings<100 DM 3
4
12
savings<100 DM 5
19 sex=Male 5
4 sex=Male 5
5
12 4 2 5
8 8
status=salaried 3
1 status<0 DM 11 age=34 yr 5
4 age=63 yr 3
3
1 13 1 4
1 1 2 3
1 0 1 1 0 1 1 0 1 1 0 1
Normalized score Normalized score Normalized score Normalized score
Feat is unable to identify it as an important attribute. L EWIS uses the underlying causal graph to capture the causal
relationship between the two attributes.
In Figures 10a and 10b, we report the rankings obtained by LIME, SHAP and L EWIS on two instances that respectively
have negative and positive predicted outcomes. Age and account status have a high negative contribution toward the
outcome in Figure 10a, indicating that an increase in either is likely to reverse the decision. Intuitively, with age,
continued employment and improved account status, individuals tend to have better savings, credit history, housing,
etc., which, in turn, contribute toward a positive outcome. L EWIS’s ranking captures this causal dependency between
the attributes, which is recorded by neither SHAP nor LIME.
To compare recourse generated by L EWIS and LinearIP, we tested them on the example for Maeve in Figure 1.
While both the methods identify the same solution for small thresholds, LinearIP did not return any solution for
success threshold > 0.8. In contrast to L EWIS that generalizes to black-box algorithms, LinearIP depends on linear
classifiers and offers recommendations that do not account for the causal relationship between attributes.
Adult. In Figure 9b, the ranking of attributes generated by L EWIS and Feat matches observations in prior literature
that consider occupation, education and marital status to be the most important attributes. However, SHAP picks on
the correlation of age with marital status and occupation (older individuals are more likely to be married and have
better jobs), and ranks it higher. The rankings are similar for XGBoost (Figure 8a) and Random forest (Figure 9b)
5.5 Correctness of L EWIS’s explanations 16
but different for the neural network (Figure 8b). We investigated the outputs and observed that the prediction of neural
networks differs from that of random forest and XGBoost for more than 20% of the test samples, leading to varied
ranking of attributes. Additionally, the class of an individual is ranked important by the classifier. Since country and
sex have a causal impact on class, it justifies their high ranks as generated by L EWIS. In Figure 8b, we do not report
the scores for Feat as it does not support neural networks.
In Figures 10c and 10d, we compare L EWIS with local explanation methods LIME and SHAP. Consistent with existing
studies on this dataset, L EWIS recognizes the negative contribution of unmarried marital status and positive contri-
bution of sex=male toward the negative outcome. For the positive outcome example, L EWIS identifies that age, sex
and country have a high positive contribution toward the outcome due to their causal impact on attributes such as
occupation and marital status (ranked higher by SHAP and LIME). We also observed that the results of SHAP are not
stable across different iterations.
COMPAS. Since COMPAS scores were calculated based on criminal history and indications of juvenile delin-
quency [1], the higher ranking of juvenile crime history by L EWIS is justified in Figure 9c. Note that bias penetrated
into the system due to the correlation between demographic and non-demographic attributes. SHAP and Feat capture
this correlation and rank age higher than juvenile crime history.
Drug. Figure 9d shows that all techniques have a similar ordering of attributes with country and age being most crucial
for the desired outcome. Comparing the local explanations of L EWIS with SHAP and LIME (Figure 7), we observe that
L EWIS correctly identifies the negative contribution of higher education toward negative drug consumption prediction
and the positive contribution of a lower level of education toward a positive drug consumption prediction.
Since ground truth is not available in real-world data, we evaluate the correctness of L EWIS on the German-Syn
dataset.
Correctness of estimated scores In Figure 11a, we compare the global explanation scores of different variables with
ground truth necessity and sufficiency score estimated using Pearl’s three-step procedure discussed in equation (3)
(Section 2). We present the comparison for a non-linear regression based black-box algorithm with respect to outcome
o = 0.5. The average global explanation scores returned by L EWIS are consistently similar to ground truth estimates,
thereby validating the correctness of Proposition 4.2. SHAP and Feat capture the correlation between the input and
output attributes, and rank Status higher than Age and Sex which are assigned scores close to 0. These attributes
do not directly impact the output but indirectly impact it through Status and Saving. This experiment validates the
ability of L EWIS in capturing causal effects between different attributes, estimate explanation scores accurately and
present actionable insights as compared to SHAP and Feat. To understand the effect of the number of samples on the
scores estimated by L EWIS, we compare the N E S UF scores of status for different sample sizes in Figure 11b. We
observe that the variance in estimation is reduced with an increase in sample size and scores converge to ground truth
estimates for larger samples.
Robustness to violation of Monotonicity To evaluate the impact of non-monotonicity on the explanation scores gen-
erated by L EWIS, we changed the structural equations for the causal graph of German-Syn to simulate non-monotonic
effect of Age on the prediction attribute. This data was used to train random forest and XGBoost classifiers. We
measured monotonicity violation as Λviol = Pr[o0X←x |o, x0 ]. Note that Λviol = 0 implies monotonicity and higher
Λviol denotes higher violation of monotonicity. We observed that the scores estimated by L EWIS differ from ground
truth estimates by less than 5%, as long as the monotonicity violation is less than 0.25. Furthermore, the relative rank-
ing of the attributes remains consistent with the ground truth ranking calculated using equation (3). This experiment
demonstrates that the explanations generated by L EWIS are robust to slight violation in monotonicity.
Recourse analysis. We sampled 1000 random instances that received negative outcomes and generated recourse
(sufficiency threshold α = 0.9) using L EWIS. Each unit change in attribute value was assigned unit cost. The output
was evaluated with respect to the ground truth sufficiency and cost of returned actions. In all instances, L EWIS’s
output achieved more than 0.9 sufficiency with the optimal cost. This experiment validates the optimality of the IP
formulation in generating effective recourse. To further test the scalability of L EWIS, we considered a causal graph
with 100 variables and increased the number of actionable variables from 5 to 100. The number of constraints grew
linearly from 6 to 101 (one for each actionable variable and one for the sufficiency constraint), and the running time
increased from 1.65 seconds to 8.35 seconds, demonstrating L EWIS’s scalability to larger inputs.
Our research is mainly related to XAI work in quantifying feature importance and counterfactual explanations.
6 RELATED WORK AND DISCUSSION 17
Scores 0.70
0.0 0.5 1.0
Ground Truth
Age 1
1
0.65 Lewis
Saving
2
2
0.60
2
2
1
0.55
Status 3
1
3
0.50
4
Sex 4
4
4
SHAP
0.45
3 Feat
Housing 5
5
3
Lewis 0.40 1K 50K 100K
Ground truth
Sample Size
a) Quality of the estimates. b) Effect of sample size on error.
Figure 11: Comparing with ground truth.
Quantifying feature importance. Due to its strong axiomatic guarantees, methods based on Shapley values are
emerging as the de facto approach for quantifying feature influence [50, 86, 54, 53, 14, 58, 24, 2]. However, several
practical and epistemological issues have been identified with these methods. These issues arise primarily because ex-
isting proposals for quantifying the marginal influence of an attribute do not have any causal interpretation in general
and, therefore, can lead to incorrect and misleading explanations [58, 44, 24]. Another popular method for generat-
ing local explanations is LIME (Local Interpretable Model-agnostic Explanations [72], which trains an interpretable
classifier (such as linear regression) on an instance obtained by perturbing that instance to be explained around its
neighborhood. Several issues with LIME have also been identified in the literature, including its lack of human inter-
pretability, its sensitivity to the choice of local perturbation, and its vulnerability to adversarial attacks [4, 54, 62, 84].
Unlike existing methods, our proposal offers the following advantages. (1) It is grounded in causality and counterfac-
tual reasoning, captures insights from the theoretical foundation of explanations in philosophy, epistemology and social
science, and can provide provably correct explanations. It has been argued that humans are selective about explanations
and, depending on the context, certain contrasts are more meaningful than others [60, 19]. The notions of necessity
and sufficiency have been shown to be strong criteria for preferred explanatory causes [29, 75, 30, 87, 74, 11, 67]. (2)
It accounts for indirect influence of attributes on algorithm’s decisions; the problem of quantifying indirect influence
has received scant attention in XAI literature (see [3] for a non-causality-based approach). (3) It builds upon scores
that are customizable and can therefore generate explanations at the global, contextual and local levels. (4) It can audit
black-box algorithms merely by using historical data on its input and outputs.
Counterfactual explanations. Our work is also related to a line of research that leverages counterfactuals to explain
ML algorithm predictions [96, 47, 40, 92, 56, 64]. In this context, the biggest challenge is generating explanations that
follow natural laws and are feasible and actionable in the real world. Recent work attempts to address feasibility use
ad hoc constraints [92, 64, 20, 39, 17, 94, 52]. However, it has been argued that feasibility is fundamentally a causal
concept [7, 56, 40]. Few attempts have been made to develop a causality-based approach that can generate actionable
recourse by relying on the strong assumption that the underlying probabilistic causal model is fully specified or can be
learned from data [56, 40, 41]. Our framework extends this line of work by (1) formally defining feasibility in terms of
probabilistic contrastive counterfactuals, and (2) providing a theoretical justification for taking a fully non-parametric
approach for computing contrastive counterfactuals from historical data, thereby making no assumptions about the
internals of the decision-making algorithm and the structural equations in the underlying probabilistic causal models.
The IP formulation we used to generate actionable recourse is similar to [92] with a difference that their approach
uses classifier parameters to bound the change in prediction as opposed to our sufficiency score based constraint. Our
formulation is not only causal but also independent of the internals of the black-box algorithm.
Logic-based methods. Our work shares some similarities with recent work in XAI that employs tools from logic-
based diagnosis and operates with the logical representations of ML algorithms [83, 38, 12]. In this context, the
fundamental concepts of prime implicate/implicant are closely related to sufficiency and necessary causation when the
underlying causal model is a logical circuit [37, 16, 34, 34, 13]. It can be shown that the notion of sufficient/necessary
explanations proposed in [83] translates to explanations in terms of a set of attributes that have a sufficiency/necessary
score of 1. However, these methods can generate explanations only in terms of a set of attributes, are intractable in
6 RELATED WORK AND DISCUSSION 18
model-agnostic settings, fail to account for the causal interaction between attributes, and cannot go beyond determin-
istic algorithms.
Algorithmic fairness. The critical role of causality and background knowledge is recognized and acknowledged in
the algorithmic fairness literature [45, 43, 65, 76, 25, 79, 81, 78]. In this context, contrastive counterfactuals have been
used to capture individual-level fairness [46]. It is easy to show that the notion of counterfactual fairness in [46] can
be captured by the explanation scores introduced in this paper provided that an algorithm is counterfactually fair w.r.t.
a protected attribute if the sufficiency score and necessity score of the sensitive attribute are both zero. Hence, L EWIS
is useful for reasoning about individual-level fairness and discrimination.
Orthogonal to our work, strategic classification addresses devising techniques that are robust to manipulation and
gaming; recent literature has focused on studying the causal implications of such behavior [59]. In the future, we plan
to incorporate such techniques that make our system robust with respect to gaming. The metrics we introduce here for
quantifying the necessity, sufficiency and necessity and sufficiency of an algorithm’s input for its decision are adopted
from the literature on probability of causation [29, 75, 30, 87, 74, 11, 67]. The results developed in Section 4.1
generalize and subsume earlier results from [87, 67] and substantially simplify their proofs.
Assumptions and limitations. Our framework relies on two main assumptions to estimate and bound explanation
scores, namely, the availability of (1) data that is a representative sample of the underlying population of interest, and
(2) knowledge of the underlying causal diagram. Dealing with non-representative samples goes beyond the scope of
this paper, but there are standard approaches that can be adopted (see, e.g., [6]). Furthermore, L EWIS is designed
to work with any level of user’s background knowledge. If no background knowledge is provided, L EWIS assumes
no-confounding, i.e., Pr(o | do(x), k) = Pr(o | x, k) and monotonicity. Under these assumptions, the necessity score
0 0
,k)−Pr(o0 |x,k) 0
and sufficiency score, respectively, become Pr(o |xPr(o|x,k) and Pr(o|x,k)−Pr(o|x
Pr(o0 |x0 ,k)
,k)
. The former can be seen as
a group-level attributable fraction, which is widely used in epidemiology as a measure of the proportion of cases
attributed to a particular risk factor [71]; the latter can be seen as a group-level relative risk, which is widely used in
epidemiology to measure the risk of contracting a disease in a group exposed to a risk factor [42]. When computed for
individuals, these quantities can be interpreted as proportional to the difference between the ratio of positive/negative
algorithmic decisions for individuals that are similar on all attributes except for X. In other words, the quantities
measure the correlation between X and the algorithm’s decisions across similar individuals. This correlation can
be interpreted causally only under the no-confounding and monotonicity assumptions. Nonetheless, quantifying the
local influence of an attribute by measuring its correlation with an algorithm’s decision across similar individuals
underpins most existing methods for generating local explanations such as Shapley values based methods [54, 2],
feature importance [86] , and LIME [72]. Approaches differ in terms of how they measure this correlation.
In principle, background knowledge on underlying causal models is required to generate effective and actionable
explanations. While this may be considered a limitation of our approach, we argue that all existing XAI methods
either explicitly or implicitly make causal assumptions (such as those mentioned above in addition to feature indepen-
dence and the possibility of simulating interventional distributions by perturbing data or using marginal distributions).
Hence, our framework replaces assumptions that are unrealistic with assumptions about the underlying causal diagram
that need not be perfect to obtain valuable insights, can be validated using historical data and background knowl-
edge [67], and can be learned from a mixture of historical and interventional data [27]. As argued above, in the worst
case, our assumptions about generating local explanations are similar to those of existing work. Nevertheless, we
show empirically in Section 5 that our methods are robust to slight violations of underlying assumptions and generate
insights considerably beyond state-of-the-art methods in XAI.
REFERENCES 19
References
[1] Machine bias https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing,
2016.
[2] Kjersti Aas, Martin Jullum, and Anders Løland. Explaining individual predictions when features are dependent:
More accurate approximations to shapley values. arXiv preprint arXiv:1903.10464, 2019.
[3] Philip Adler, Casey Falk, Sorelle A Friedler, Tionney Nix, Gabriel Rybeck, Carlos Scheidegger, Brandon Smith,
and Suresh Venkatasubramanian. Auditing black-box models for indirect influence. Knowledge and Information
Systems, 54(1):95–122, 2018.
[4] David Alvarez-Melis and Tommi S Jaakkola. On the robustness of interpretability methods. arXiv preprint
arXiv:1806.08049, 2018.
[5] Daniel W Apley and Jingyu Zhu. Visualizing the effects of predictor variables in black box supervised learning
models. arXiv preprint arXiv:1612.08468, 2016.
[6] Elias Bareinboim and Judea Pearl. Controlling selection bias in causal inference. In Artificial Intelligence and
Statistics, pages 100–108, 2012.
[7] Solon Barocas, Andrew D Selbst, and Manish Raghavan. The hidden assumptions behind counterfactual expla-
nations and principal reasons. In Proceedings of the 2020 Conference on Fairness, Accountability, and Trans-
parency, pages 80–89, 2020.
[8] Richard Berk. Machine Learning Risk Assessments in Criminal Justice Settings. Springer, 2019.
[9] Leo Breiman. Random forests. Mach. Learn., 45(1):5–32, October 2001.
[10] Silvia Chiappa. Path-specific counterfactual fairness. In The Thirty-Third AAAI Conference on Artificial Intel-
ligence, AAAI 2019, The Thirty-First Innovative Applications of Artificial Intelligence Conference, IAAI 2019,
The Ninth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019, Honolulu, Hawaii,
USA, January 27 - February 1, 2019, pages 7801–7808. AAAI Press, 2019.
[11] Louis Anthony Cox Jr. Probability of causation and the attributable proportion risk. Risk Analysis, 4(3):221–230,
1984.
[12] Adnan Darwiche and Auguste Hirth. On the reasons behind decisions. arXiv preprint arXiv:2002.09284, 2020.
[13] Adnan Darwiche and Judea Pearl. Symbolic causal networks. In AAAI, pages 238–244, 1994.
[14] Anupam Datta, Shayak Sen, and Yair Zick. Algorithmic transparency via quantitative input influence: Theory
and experiments with learning systems. In 2016 IEEE symposium on security and privacy (SP), pages 598–617.
IEEE, 2016.
[15] Maartje MA De Graaf and Bertram F Malle. How people explain action (and autonomous intelligent systems
should too). In 2017 AAAI Fall Symposium Series, 2017.
[16] Johan De Kleer, Alan K Mackworth, and Raymond Reiter. Characterizing diagnoses and systems. Artificial
intelligence, 56(2-3):197–222, 1992.
[17] Amit Dhurandhar, Pin-Yu Chen, Ronny Luss, Chun-Chen Tu, Paishun Ting, Karthikeyan Shanmugam, and Payel
Das. Explanations based on the missing: Towards contrastive explanations with pertinent negatives. In Advances
in Neural Information Processing Systems, pages 592–603, 2018.
[18] Dheeru Dua and Casey Graff. UCI machine learning repository, 2017.
[19] Curt J Ducasse. On the nature and the observability of the causal relation. The Journal of Philosophy, 23(3):57–
68, 1926.
[20] Anna Fariha, Ashish Tiwari, Arjun Radhakrishna, and Sumit Gulwani. Extune: Explaining tuple non-
conformance. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data,
SIGMOD ’20, page 2741–2744, New York, NY, USA, 2020. Association for Computing Machinery.
[21] E. Fehrman, A. K. Muhammad, E. M. Mirkes, V. Egan, and A. N. Gorban. The five factor model of personality
and evaluation of drug consumption risk, 2017.
[22] Aaron Fisher, Cynthia Rudin, and Francesca Dominici. Model class reliance: Variable importance measures
for any machine learning model class, from the “rashomon” perspective. arXiv preprint arXiv:1801.01489, 68,
2018.
[23] Jerome H Friedman. Greedy function approximation: a gradient boosting machine. Annals of statistics, pages
1189–1232, 2001.
REFERENCES 20
[24] Christopher Frye, Ilya Feige, and Colin Rowat. Asymmetric shapley values: incorporating causal knowledge
into model-agnostic explainability. arXiv preprint arXiv:1910.06358, 2019.
[25] Sainyam Galhotra, Yuriy Brun, and Alexandra Meliou. Fairness testing: testing software for discrimination. In
Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering, pages 498–510. ACM,
2017.
[26] Tobias Gerstenberg, Noah D Goodman, David A Lagnado, and Joshua B Tenenbaum. How, whether, why:
Causal judgments as counterfactual contrasts. In CogSci, 2015.
[27] Clark Glymour, Kun Zhang, and Peter Spirtes. Review of causal discovery methods based on graphical models.
Frontiers in genetics, 10:524, 2019.
[28] Alex Goldstein, Adam Kapelner, Justin Bleich, and Emil Pitkin. Peeking inside the black box: Visualizing
statistical learning with plots of individual conditional expectation. Journal of Computational and Graphical
Statistics, 24(1):44–65, 2015.
[29] Sander Greenland. Relation of probability of causation to relative risk and doubling dose: a methodologic error
that has become a social problem. American journal of public health, 89(8):1166–1169, 1999.
[30] Sander Greenland and James M Robins. Epidemiology, justice, and the probability of causation. Jurimetrics,
40:321, 1999.
[31] Brandon M Greenwell, Bradley C Boehmke, and Andrew J McCarthy. A simple and effective model-based
variable importance measure. arXiv preprint arXiv:1805.04755, 2018.
[32] Eric Grynaviski. Contrasts, counterfactuals, and causes. European Journal of International Relations, 19(4):823–
846, 2013.
[33] Riccardo Guidotti, Anna Monreale, Salvatore Ruggieri, Franco Turini, Fosca Giannotti, and Dino Pedreschi. A
survey of methods for explaining black box models. ACM computing surveys (CSUR), 51(5):1–42, 2018.
[34] Joseph Y Halpern and Judea Pearl. Causes and explanations: A structural-model approach. part ii: Explanations.
The British journal for the philosophy of science, 56(4):889–911, 2005.
[35] Giles Hooker. Discovering additive structure in black box functions. In Proceedings of the tenth ACM SIGKDD
international conference on Knowledge discovery and data mining, pages 575–580, 2004.
[36] Giles Hooker and Lucas Mentch. Please stop permuting features: An explanation and alternatives. arXiv preprint
arXiv:1905.03151, 2019.
[37] Mark Hopkins and Judea Pearl. Clarifying the usage of structural models for commonsense causal reasoning.
In Proceedings of the AAAI Spring Symposium on Logical Formalizations of Commonsense Reasoning, pages
83–89. AAAI Press Menlo Park, CA, 2003.
[38] Alexey Ignatiev. Towards trustable explainable ai. In 29th International Joint Conference on Artificial Intelli-
gence, pages 5154–5158, 2020.
[39] Shalmali Joshi, Oluwasanmi Koyejo, Warut Vijitbenjaronk, Been Kim, and Joydeep Ghosh. Towards real-
istic individual recourse and actionable explanations in black-box decision making systems. arXiv preprint
arXiv:1907.09615, 2019.
[40] Amir-Hossein Karimi, Gilles Barthe, Borja Belle, and Isabel Valera. Model-agnostic counterfactual explanations
for consequential decisions. arXiv preprint arXiv:1905.11190, 2019.
[41] Amir-Hossein Karimi, Julius von Kügelgen, Bernhard Schölkopf, and Isabel Valera. Algorithmic recourse under
imperfect causal knowledge: a probabilistic approach. arXiv preprint arXiv:2006.06831, 2020.
[42] Muin J Khoury, W Dana Flanders, Sander Greenland, and Myron J Adams. On the measurement of susceptibility
in epidemiologic studies. American Journal of Epidemiology, 129(1):183–190, 1989.
[43] Niki Kilbertus, Mateo Rojas Carulla, Giambattista Parascandolo, Moritz Hardt, Dominik Janzing, and Bernhard
Schölkopf. Avoiding discrimination through causal reasoning. In Advances in Neural Information Processing
Systems, pages 656–666, 2017.
[44] I Elizabeth Kumar, Suresh Venkatasubramanian, Carlos Scheidegger, and Sorelle Friedler. Problems with
shapley-value-based explanations as feature importance measures. arXiv preprint arXiv:2002.11097, 2020.
[45] Matt J Kusner, Joshua Loftus, Chris Russell, and Ricardo Silva. Counterfactual fairness. In Advances in Neural
Information Processing Systems, pages 4069–4079, 2017.
[46] Matt J. Kusner, Joshua R. Loftus, Chris Russell, and Ricardo Silva. Counterfactual fairness. In NIPS, pages
4069–4079, 2017.
REFERENCES 21
[47] Thibault Laugel, Marie-Jeanne Lesot, Christophe Marsala, Xavier Renard, and Marcin Detyniecki. Inverse
classification for comparison-based interpretability in machine learning. arXiv preprint arXiv:1712.08443, 2017.
[48] David K Lewis. Causal explanation. 1986.
[49] M. Lichman. Uci machine learning repository, 2013.
[50] Stan Lipovetsky and Michael Conklin. Analysis of regression in game theory approach. Applied Stochastic
Models in Business and Industry, 17(4):319–330, 2001.
[51] Peter Lipton. Contrastive explanation. Royal Institute of Philosophy Supplement, 27:247–266, 1990.
[52] Shusen Liu, Bhavya Kailkhura, Donald Loveland, and Yong Han. Generative counterfactual introspection for
explainable deep learning. arXiv preprint arXiv:1907.03077, 2019.
[53] Scott M Lundberg, Gabriel G Erion, and Su-In Lee. Consistent individualized feature attribution for tree ensem-
bles. arXiv preprint arXiv:1802.03888, 2018.
[54] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. In Advances in neural
information processing systems, pages 4765–4774, 2017.
[55] Scott M. Lundberg and Su-In Lee. A unified approach to interpreting model predictions. In NIPS, pages 4765–
4774, 2017.
[56] Divyat Mahajan, Chenhao Tan, and Amit Sharma. Preserving causal constraints in counterfactual explanations
for machine learning classifiers. arXiv preprint arXiv:1912.03277, 2019.
[57] David R Mandel. Counterfactual and causal explanation. Routledge research international series in social
psychology. The Psychology of Counterfactual Thinking, pages 11–27, 2005.
[58] Luke Merrick and Ankur Taly. The explanation game: Explaining machine learning models with cooperative
game theory. arXiv preprint arXiv:1909.08128, 2019.
[59] John Miller, Smitha Milli, and Moritz Hardt. Strategic classification is causal modeling in disguise. In Hal Daumé
III and Aarti Singh, editors, Proceedings of the 37th International Conference on Machine Learning, volume 119
of Proceedings of Machine Learning Research, pages 6917–6926. PMLR, 13–18 Jul 2020.
[60] Tim Miller. Explanation in artificial intelligence: Insights from the social sciences. Artificial Intelligence, 267:1–
38, 2019.
[61] Brent Mittelstadt, Chris Russell, and Sandra Wachter. Explaining explanations in ai. In Proceedings of the
conference on fairness, accountability, and transparency, pages 279–288, 2019.
[62] Christoph Molnar. Interpretable Machine Learning. Lulu. com, 2020.
[63] Adam Morton. Contrastive knowledge. Contrastivism in philosophy, pages 101–115, 2013.
[64] Ramaravind K Mothilal, Amit Sharma, and Chenhao Tan. Explaining machine learning classifiers through di-
verse counterfactual explanations. In Proceedings of the 2020 Conference on Fairness, Accountability, and
Transparency, pages 607–617, 2020.
[65] Razieh Nabi and Ilya Shpitser. Fair inference on outcomes. In Proceedings of the... AAAI Conference on Artificial
Intelligence. AAAI Conference on Artificial Intelligence, volume 2018, page 1931. NIH Public Access, 2018.
[66] Judea Pearl. Direct and indirect effects. In Jack S. Breese and Daphne Koller, editors, UAI ’01: Proceedings
of the 17th Conference in Uncertainty in Artificial Intelligence, University of Washington, Seattle, Washington,
USA, August 2-5, 2001, pages 411–420. Morgan Kaufmann, 2001.
[67] Judea Pearl. Causality. Cambridge university press, 2009.
[68] Judea Pearl. Detecting latent heterogeneity. Sociological Methods & Research, 46(3):370–389, 2017.
[69] Judea Pearl. The seven tools of causal inference, with reflections on machine learning. Communications of the
ACM, 62(3):54–60, 2019.
[70] Judea Pearl, Madelyn Glymour, and Nicholas P Jewell. Causal inference in statistics: A primer. John Wiley &
Sons, 2016.
[71] Charles Poole. A history of the population attributable fraction and related measures. Annals of epidemiology,
25(3):147–154, 2015.
[72] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions
of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery
and data mining, pages 1135–1144, 2016.
REFERENCES 22
[73] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. Anchors: High-precision model-agnostic explanations.
In AAAI, volume 18, pages 1527–1535, 2018.
[74] David W Robertson. Common sense of cause in fact. Tex. L. Rev., 75:1765, 1996.
[75] James Robins and Sander Greenland. The probability of causation under a stochastic model for individual risk.
Biometrics, pages 1125–1138, 1989.
[76] Chris Russell, Matt J Kusner, Joshua Loftus, and Ricardo Silva. When worlds collide: integrating different
counterfactual assumptions in fairness. In Advances in Neural Information Processing Systems, pages 6414–
6423, 2017.
[77] Babak Salimi, Johannes Gehrke, and Dan Suciu. Bias in OLAP queries: Detection, explanation, and removal. In
Proceedings of the 2018 International Conference on Management of Data, SIGMOD Conference 2018, Hous-
ton, TX, USA, June 10-15, 2018, pages 1021–1035, 2018.
[78] Babak Salimi, Bill Howe, and Dan Suciu. Database repair meets algorithmic fairness. ACM SIGMOD Record,
49(1):34–41, 2020.
[79] Babak Salimi, Harsh Parikh, Moe Kayali, Lise Getoor, Sudeepa Roy, and Dan Suciu. Causal relational learning.
In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data, pages 241–256,
2020.
[80] Babak Salimi, Luke Rodriguez, Bill Howe, and Dan Suciu. Capuchin: Causal database repair for algorithmic
fairness. arXiv preprint arXiv:1902.08283, 2019.
[81] Babak Salimi, Luke Rodriguez, Bill Howe, and Dan Suciu. Interventional fairness: Causal database repair
for algorithmic fairness. In Proceedings of the 2019 International Conference on Management of Data, pages
793–810. ACM, 2019.
[82] Andrew D Selbst and Solon Barocas. The intuitive appeal of explainable machines. Fordham L. Rev., 87:1085,
2018.
[83] Andy Shih, Arthur Choi, and Adnan Darwiche. A symbolic approach to explaining bayesian network classifiers.
arXiv preprint arXiv:1805.03364, 2018.
[84] Dylan Slack, Sophie Hilgard, Emily Jia, Sameer Singh, and Himabindu Lakkaraju. Fooling lime and shap:
Adversarial attacks on post hoc explanation methods. In Proceedings of the AAAI/ACM Conference on AI,
Ethics, and Society, pages 180–186, 2020.
[85] Kacper Sokol and Peter A Flach. Counterfactual explanations of machine learning predictions: opportunities and
challenges for ai safety. In SafeAI@ AAAI, 2019.
[86] Erik Štrumbelj and Igor Kononenko. Explaining prediction models and individual predictions with feature con-
tributions. Knowledge and information systems, 41(3):647–665, 2014.
[87] Jin Tian and Judea Pearl. Probabilities of causation: Bounds and identification. Annals of Mathematics and
Artificial Intelligence, 28(1-4):287–313, 2000.
[88] Florian Tramèr, Vaggelis Atlidakis, Roxana Geambasu, Daniel Hsu, Jean-Pierre Hubaux, Mathias Humbert, Ari
Juels, and Huang Lin. Fairtest: Discovering unwarranted associations in data-driven applications. In IEEE
European Symposium on Security and Privacy (EuroS&P). IEEE, 2017.
[89] https://docs.fast.ai/tabular.learner.htm. Fastai neural network.
[90] https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.
RandomForestClassifier.html. sklearn python library.
[91] https://xgboost.readthedocs.io/en/latest/. Xgboost.
[92] Berk Ustun, Alexander Spangher, and Yang Liu. Actionable recourse in linear classification. In Proceedings of
the Conference on Fairness, Accountability, and Transparency, pages 10–19, 2019.
[93] Jennifer Valentino-Devries, Jeremy Singer-Vine, and Ashkan Soltani. Websites vary prices, deals based on users’
information. Wall Street Journal, 10:60–68, 2012.
[94] Arnaud Van Looveren and Janis Klaise. Interpretable counterfactual explanations guided by prototypes. arXiv
preprint arXiv:1907.02584, 2019.
[95] Suresh Venkatasubramanian and Mark Alfano. The philosophical basis of algorithmic recourse. In Proceedings
of the 2020 Conference on Fairness, Accountability, and Transparency, pages 284–293, 2020.
[96] Sandra Wachter, Brent Mittelstadt, and Chris Russell. Counterfactual explanations without opening the black
box: Automated decisions and the gdpr. Harv. JL & Tech., 31:841, 2017.
REFERENCES 23
[97] James Woodward. Making things happen: A theory of causal explanation. Oxford university press, 2005.
[98] Indre Zliobaite, Faisal Kamiran, and Toon Calders. Handling conditional discrimination. In Proceedings of
the 2011 IEEE 11th International Conference on Data Mining, ICDM ’11, page 992–1001, USA, 2011. IEEE
Computer Society.