[go: up one dir, main page]

0% found this document useful (0 votes)
15 views10 pages

Interpreting Black-Box Machine Learning Models

Uploaded by

ishika.p1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views10 pages

Interpreting Black-Box Machine Learning Models

Uploaded by

ishika.p1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Interpreting Black-box Machine Learning Models

for High Dimensional Datasets


Md. Rezaul Karim∗† , Md Shajalal†‡ , Alexander Gra߆∗ , Till Döhmen† , Sisay Adugna Chala†∗ ,
Alexander Boden†¶ , Christian Beecks§† , and Stefan Decker∗†
∗ Computer Science 5 - Information Systems and Databases, RWTH Aachen University, Germany
† Fraunhofer - Institute for Applied Information Technology FIT, Germany
‡ University of Siegen, Germany
§ University of Hagen, Germany
¶ Bonn-Rhein-Sieg University of Applied Sciences, Germany
arXiv:2208.13405v4 [cs.LG] 21 Nov 2023

Abstract—Many datasets are of increasingly high dimension- only introduces unwanted noise but also increases computa-
ality, where a large number of features could be irrelevant to tional complexity as the data becomes sparser. With increased
the learning task. The inclusion of such features would not modelling complexity involving hundreds of features and their
only introduce unwanted noise but also increase computational
complexity. Deep neural networks (DNNs) outperform machine interactions, making a general conclusion or interpreting the
learning (ML) algorithms in a variety of applications due to their black-box model’s outcome becomes increasingly difficult,
effectiveness in modelling complex problems and handling high- whereas many approaches do not take into account understand-
dimensional datasets. However, due to non-linearity and higher- ing the inner structure of opaque models.
order feature interactions, DNN models are unavoidably opaque, In contrast, DNNs benefit from higher pattern recognition
making them black-box methods. In contrast, an interpretable
model can identify statistically significant features and explain capabilities during learning useful representation from such
the way they affect the model’s outcome. In this paper1 , we datasets. With multiple hidden layers and non-linear activation
propose a novel method to improve the interpretability of black- functions within layers, autoencoder (AEs) can model com-
box models in the case of high-dimensional datasets. First, a plex and higher-order feature interactions. Learning non-linear
black-box model is trained on full feature space that learns mappings allow embedding input feature space into a lower-
useful embeddings on which the classification is performed. To
decompose the inner principles of the black-box and to identify dimensional latent space. Such representations can be used for
top-k important features (global explainability), probing and both supervised and unsupervised downstream tasks [3]. The
perturbing techniques are applied. An interpretable surrogate embedding can capture contextual information of the data [3].
model is then trained on top-k feature space to approximate the However, predictions from such a black-box model can neither
black-box. Finally, decision rules and counterfactuals are derived be traced back to the input, nor it is clear why outputs are
from the surrogate to provide local decisions. Our approach out-
performs tabular learners, e.g., TabNet and XGboost, and SHAP- transformed in a certain way. This exposes even the most
based interpretability techniques, when tested on a number of accurate model’s inability to answer questions like “how and
datasets having dimensionality between 54 and 20,5312 . why inputs are ultimately mapped to certain decisions”. In
Index Terms—Curse of dimensionality, Black-box models, sensitive areas like banking and healthcare, explainability and
Interpretability, Attention mechanism, Model surrogation. accountability are not only some desirable properties of AI
but also legal requirements – especially where AI would
I. I NTRODUCTION have a significant impact on human lives [4]. Therefore, legal
landscapes are fast-moving in European and North American
High availability and easy access to large datasets, AI
countries, e.g., EU GDPR enforces that processing based on
accelerators, and state-of-the-art machine learning (ML) and
automated decision-making tools should be subject to suitable
deep learning (DNNs) algorithms paved the way for per-
safeguards, including “right to obtain an explanation of the
forming predictive modelling at scale. However, in the case
decision reached after such assessment and to challenge
of high-dimensional datasets (e.g., omics), the feature space
the decision”. Since how decisions are made should be as
exponentially increases. Principal component analysis (PCA)
transparent as possible in a faithful and interpretable manner.
and isometric feature mapping (Isomap) are widely used to
Explainable AI (XAI), which gains a lot of attention
tackle the curse of dimensionality [1]. Although they preserve
from both academia and industries, aims to overcome the
inter-point distances, they are fundamentally limited to lin-
opaqueness of black-boxes and brings transparency in AI sys-
ear embedding and tend to lose useful information, which
tems. Model-specific and model-agnostic approaches covering
makes them less effective in dimensionality reduction [2].
local and global interpretability have emerged [5]. While local
The inclusion of a large number of irrelevant features not
explanations focus on explaining individual predictions, global
1 This paper is accepted and included in proceedings of 2023 IEEE 10th International
explanations explain entire model behaviour using plots or
Conference on Data Science and Advanced Analytics (DSAA’2023) 2 GitHub: decision sets. Although an interpretable model can explain
https://github.com/rezacsedu/DeepExplainHidim how it makes a prediction by exposing important factors that
influence its outcomes, interpretability comes at the cost of black-box model is used to produce heatmaps indicating
efficiency. Research suggested by learning an interpretable their relative importance. Gradient-weighted class activation
model to approximate a black-box globally in order to pro- mapping (Grad-CAM++) [11] and layer-wise relevance prop-
vide local explanations [6]. A surrogate model’s input-output agation (LRP) [12] are examples of this category that highlight
behaviour can be represented in a more human-interpretable relevant parts of inputs, e.g., images to a DNN which caused
using decision rules (DRs). DRs containing antecedents (IF) the decision can be highlighted. Attention mechanisms are
and a consequent (THEN) provide intuitive explanations3 than used in a variety of supervised and language modelling tasks,
graph- or plot-based explications [6]. as they can detect larger subsets of features. Self-attention
Further, humans tend to think in a counterfactual way by network (SAN) [13] is proposed to identify important features
asking questions like “How would the prediction have been from tabular data. TabNet [14] uses sequential attention to
if input x had been different?”4 . By using a set of rules and choose a subset of semantically meaningful features to process
counterfactuals, it is possible to explain decisions directly to at each decision step. It also visualizes the importance of
humans with the ability to comprehend the underlying reason features and how they are combined to quantify the con-
so that users can focus on 5 learned knowledge without em- tribution of each feature to the model enabling local and
phasising underlying data representations. Keeping in mind the global interpretability. SAN is found effective on datasets
practical and legal consequences of using black-box models, having a large number of features, while its performance
we propose a novel method to improve the interpretability degrades in the case of smaller datasets, indicating that having
of black-box models for classification tasks. We hypothesize not enough data can distil the relevant parts of the feature
that: i) by decomposing the inner logic (e.g., most important space [13]. Model interpretation strategies are proposed that
features), the opaqueness of a black-box can be mitigated involve training an inherently interpretable surrogate model to
by outlining the most (e.g., top-k feature space) and least learn a locally faithful approximation of a black-box model [6].
important features, ii) by finding a sub-domain of full feature Since an explanation relates the feature values of a sample to
space, would allow us training a surrogate model, which will its prediction, rule-based explanations are easier to understand
sufficiently be able to approximate the black-box model, and for humans. Anchor [15] is a rule-based method that extends
ii) a representative decision rule set can be generated with the LIME, which provides explanations in the form of decision
surrogate, which can be used to sufficiently explain individual rules. Anchor computes rules by incrementally adding equality
decisions in a human-interpretable way. conditions in the antecedents, while an estimate of the rule
II. R ELATED W ORK precision is above a threshold [16].
A drawback of rule-based explanations is overlapping and
Existing interpretable ML methods can be categorized as
contradictory rules. Sequential covering (SC) and Bayesian
either model-specific or model-agnostic with a focus on lo-
rule lists (BRL) are proposed to deal with these. SC iteratively
cal and global interpretability or either. Local interpretable
learns a single rule covering the entire training data rule-by-
model-agnostic explanations (LIME) [7], model understanding
rule and removes the data points that are already covered
through subspace explanations (MUSE) [8], SHapley Additive
by new rules, while SBRL combines pre-mined frequent
exPlanations (SHAP) [9], partial dependence plots (PDP),
patterns into a decision list using Bayesian statistics [6].
individual conditional expectation (ICE), permutation feature
Local rule-based explanations (LORE) [16] is proposed to
importance (PFI), counterfactual explanations (CE) [5] are
overcome these issues. LORE learns an interpretable model of
among others. These methods operate by approximating the
a neighbourhood based on genetic algorithms. LORE derives
outputs of an opaque model via tractable logic, such as
explanations via the interpretable model and provides local
game theoretic Shapley values (SVs) or local approximation
explanations in the form of a decision rule and counterfactuals
of complex or black-box models via a linear model [10].
- that signifies making what feature values may lead to a
Since these approaches do not take into account the inner
different outcome. LIME indicates where to look for a decision
structure of an opaque black-box model, probing, perturbing,
based on feature values, while counterfactual rules of LORE
attention mechanism, sensitivity analysis (SA), saliency maps,
signify minimal-change contexts for reversing the predictions.
and gradient-based attribution methods have been proposed to
understand the underlying logic of complex models. III. M ETHODS
Saliency map and gradient-based methods can identify
Each high-dimensional dataset has a large feature space.
relevant regions and assign importance to each feature, e.g.,
Therefore, first, we train a black-box model to learn represen-
image pixels, where first-order gradient information of a
tations. Then, we classify the data points on their embedding
3 An example rule for a loan application denial could be “IF monthly income space instead of the original feature space. To decompose
= 3000 AND credit rating history=BAD AND employment status=YES the inner structure of the black box, probing and perturbing
AND married=YES, THEN decision = DENY” 4 “What would have been
the decision if my monthly income would be higher?” 5 “Although you’re techniques are applied to identify top-k features that contribute
employed, given your monthly income of 2,000 EUR and having bad credit most to the overall model’s decision-making. An interpretable
rating history, our model has denied your application, as we think you’re surrogate model is then built on top-k features to approximate
unlikely to repay. Even though you have had bad credit rating history, an
increase in your monthly income of 1,000 EUR will definitely end up with the black-box. Finally, decision-rules and counterfactuals are
acceptance, as you’re already employed.” generated from the surrogate to explain individual decisions.
Fig. 1: Workflow of our proposed approach (recreated based on Karim et al. [17])

A. Building black-box models to Ψ (W ′ ∗ Z + b′ ) [3], which makes X ′ = Ψ (W ′ ⊙ Z + b′ ),


Figure 1 shows the workflow of our proposed approach for where ⊙ is the transposed convolution operation, W ′ is
interpreting black-box models. Input X is first fed into a DNN decoder’s weights, b′ is bias vectors, and Ψ is the sigmoid
to generate latent representations. It embeds the feature space activation function. The unspooling is performed with switch
into a lower dimensional latent space, s.t. X is transformed variables [18] to remember the positions of the maximum
with a nonlinear mapping fΘ : X → Z, where Z ∈ RK values during the max-pooling operation. Within each neigh-
are learned embeddings, where K ≪ F . A fully-connected bourhood of Z, both value and location of maximum elements
softmax layer is added on top of DNN by forming a black- are recorded: pooled maps store values, while switches record
box classifier fb . To parameterize fb , we train a convolutional the locations. Z is feed into a fully-connected softmax layer for
autoencoder (CAE). The function approximation properties the classification, which maps a latent point zi into an output
and feature learning capabilities help CAE extract deep and fb (zi ) 7→ ŷi in the embedding space Z by optimizing categor-
quality features [3]. Further, since weights are shared among ical cross-entropy (CE) loss (binary CE in the case of binary
layers, CAEs have the locality-preserving capability and can classification) during back-propagation. Reconstruction- and
reduce the number of parameters compared to other AEs. CE loss of CAE are then combined and optimized jointly [3]:
A convolutional layer calculates feature maps (FMs) that
n
are passed through max-pooling to downsample by taking the X
Lcae = αr Lr + αce Lce , (4)
maximums in each non-overlapping sub-region, which maps
i=1
input X into a lower-dimensional embedding space Z [3]:
where αr and αce are the regularization weights for recon-
struction and CE loss functions, respectively.
Z = gϕ (X) = σ (W ⊘ X + b) , (1)
B. Interpreting black-box models
where encoder g(.) is a sigmoid function parameterized
We apply probing, perturbing and model surrogation tech-
by ϕ ∈ Θ that include a weight matrix W ∈ Rp×q and a
niques to interpret the black-box model.
bias vector b ∈ Rq in which p and q are numbers of input
1) Probing with attention mechanism: The SANCAE ar-
and hidden units, ⊘ is the convolutional operation, Z are the
chitecture, which in fig. 2 enables self-attention at the feature
latent variables, and σ is the exponential linear unit activation
level. An attention layer is represented as [13]:
function. The decoder h(.) reconstructs the input X from latent
representation Z by applying unpooling and deconvolution s.t.  
l2 = σ W2 · α W|F | · Ω(X) + bl1 + bl2 , (5)
Z is mapped back to a reconstructed version X ′ ≈ X as [3]:
where α is an activation function, bli is layer-wise bias,
X ′ = hΘ (Z) = hΘ (gϕ (X)) , (2) and Ω is the following first network layer that maintains the
connection with input features X [13]:
where h(.) is parameterized by (θ, ϕ) ∈ Θ that are jointly
learned to generate X ′ . This is learning an identity function, 1 M
X ⊗ softmax Wlkatt X + bklatt

i.e., X ′ ≈ hθ (gϕ (X)). Mean squared error (MSE) measures Ω(X) = (6)
k
the reconstruction loss Lr : k
where X is first used as input to a softmax-activated layer
N
1 X 2 2 by setting the number of neurons to |F |, k is the number of
Lr (θ, ϕ) = (X − X ′ ) + λ ∥W ∥2 (3) attention heads representing relations between input features,
N i=1
and Wlkatt is a set of weights in respective attention heads. On
where λ is the activity regularizer and W is a vector con- the other hand, the softmax function, which is applied to i-th
taining network weights. Therefore, hθ (gϕ (X)) is equivalent element of a weight vector v is defined as follows [13]:
Fig. 2: Schematic representation of SANCAE model (recreated based on Karim et al. [17])

2) Perturbing with sensitivity analysis: We validate glob-


exp (vi ) ally important features through SA. We change a feature
softmax (vi ) = P|F | (7)
i=1 exp (vi ) value by keeping other features unchanged. If any change
in its value significantly impacts the prediction, the feature
where Wlkatt ∈ R|F |×|F | , v ∈ R|F | , and latt represents is considered to have a high impact on the prediction. We
the attention layer in which element-wise product with X create a new set X̂ ∗ by applying w-perturbation over feature
is computed in forward pass to predict labels ŷ, where two ai and measure its sensitivity at the global level. To measure
consecutive dense layers l1 and l2 contribute to predictions, ⊗ the change in predictions, we observe MSE between actual
and ⊕ are Hadamard product- and summation across k heads. and predicted labels and compare the probability distributions
As Ω maintains a bijection between features and attention over the classes6 . Sensitivity S of a feature ai is the difference
head’s weights, weights in |F | × |F | matrix represent relations between MSE at original feature space X and sampled X̂ ∗ .
between features. We hypothesize that a global weight vector However, since SA requires a large number of calculations7 ,
can be generated by applying attention to the encoder’s bottle- we make minimal changes to top-k features only in order to
neck layer. The vector is used to compute feature attributions. reduce the computational complexity.
Unlike SAN, we apply attention to embedding space (en- 3) Model surrogation: Model surrogation is a knowledge
coder’s deepest conv. layer) that can be defined as follows distillation process by finding a sub-domain of feature space,
that maintains connections between latent features Z [13]: thereby approximating the teacher via student, under the
constraint that the student is interpretable. Since most impor-
1 M tant features are already identified by the black-box fb , we
Z ⊗ softmax Wlkatt Z + bklatt .

Ω(Z) = (8) hypothesize that training a surrogate f on top-k feature space
k
k would be sufficient. As described in algorithm 1, we train f
Embedding Z is used as the input to the softmax layer in on sampled data X ∗8 and ground truths Y .
which the number of neurons is equal to the dimension of Since any interpretable model can be used for the function
embedding space. Softmax function applied to i-th element of g [6], we train decision tree (DT), random forest (RF), and
weight vector vz as follows [13]: XGBoost classifiers9 classifiers. DT iteratively splits X ∗ into
multiple subsets w.r.t to threshold values of features at each
exp (vzi ) node until a leaf node containing decision is reached. The
softmax (vzi ) = P|Z| ; vzi ∈ R|Z| (9)
exp (v ) mean importance of a feature ai is computed by going through
i=1 zi
all splits for which ai was used and adding up how much
Once the training is finished, the attention layer’s weights
PN the prediction in a child node Q w.r.t Gini
it has improved
are activated using softmax as follows [19]: IGQ = k=1 pk · (1 − pk ), where pk is the number of
instances having label yk∗ in Q. RF and XGBoost ensemble
1 M
softmax diag Wlkatt randomized predictions to get the final decisions.

Rl = , (10)
k
k 6 Two most probable classes in multi-class settings. 7 e.g., N × M ; N

where Wlkatt∈R |Z|×|Z|


. As the surrogate is used to provide and M are the number of instances and features. 8 A sub-feature-space
containing important features only. 9 Eventhough RF and XGBoost are
local explanations, top-k features are extracted as diagonal of complex tree ensembles and known to be black-boxes, DTs can be extracted
Wlkatt and ranked w.r.t. their weights. from. The best DT estimator can be used for computing FI.
Algorithm 1: Black-box model surrogation and satisfying the split condition of each decision node, we
Input : A simplified version X ∗ of dataset D (e.g., traverse until a leaf node is reached (fig. 1 in supplementary).
top-k feature space identified by fb ), Unlike in a DT, a decision can be reached through multiple
black-box model fb , interpretable model type rules with excessive lengths. Given the huge feature space,
t, and model parameters Θ∗ . textual representations would obstruct human-interpretability

Output: A surrogate model f and its predictions Ŷtest especially when the rule list length is large. Therefore, to
on held-out test set. mitigate the issue of overlapping rules, we create an ordered
∗ ∗ ∗ ∗ list of inclusive rules based on SBRL.
Xtrain , Ytrain , Xtest , Ytest ← Rules with low confidence are insignificant in discriminating

T rainT estSplit(X , Y ) classes and may not be useful in explaining the decisions.
Xtrain , Ytrain , Xtest , Ytest ← T rainT estSplit(X, Y ) Therefore, we filter rules that do not meet coverage, support,
clf ← Etimator(t, Θ∗ ) // Create estimator and confidence. Besides, we restrict the antecedents to be
a conjunction of clauses (i.e., condition on feature ai ). The

for all batches in train set ∈ Xtrain do output of each rule is a probability distribution10 . Using SBRL,
pre-mined frequent patterns are combined into a decision
∗ ∗
f ← clf.f it(Xtrain , Ytrain ) // Train list R having representative rules. Finally, the faithfulness is
surrogate computed w.r.t coverage that maximizes the fidelity of the rule
return f list. Similar to Grath et al. [21], we generate counterfactuals
end by calculating the smallest possible changes (∆x) to input x
s.t. the outcome flips from prediction y to y ′ .
M ← [f, fb ] // List of models, where fb
is trained on Xtrain IV. E XPERIMENTS
for model ∈ M do We evaluate our approach on a number of datasets for
// Generate predictions classification tasks. However, our approach is dataset-agnostic
and can be applied to any tabular dataset. We implemented
Ŷtest ← fb .predict(Xtest ) // for black-box our methods in Python using scikit-learn, Keras, and PyTorch.

Ŷtest ∗
← f.predict(Xtest ) // for surrogate To provide a fair comparison, we train TabNet and XGBoost
classifiers as they are effective for tabular datasets. We train
return {fb , Ŷ }, {f , Ŷ∗ } multilayer perceptron (MLP) on PCA projection space. We
end provide qualitative and quantitative evaluations of each model,
covering local and global explanations. We report precision,
recall, F1-score, and Matthews correlation coefficient (MCC)
scores. We assess the quality of rules w.r.t support and
C. Feature impacts and decision rules
fidelity. To assess how well f has replicated fb , R-squared
We assume the black-box fb has sufficient knowledge and measure (R2 ) is calculated as the percentage of the variance
the surrogate f has learned the mapping Y ∗ = f (X ∗ ). We of the predictions from fb captured by the surrogate itself and
hypothesize that f is able to mimic fb . We compute permuta- expressed as an indicator for goodness-of-fit [3]:
tion feature importance (PFI) for f as a view to global feature PN 2
importance (GFI). However, since PFI does not necessarily SSE ∗ (ŷi∗ − ŷi )
R2 = 1 − = 1 − Pi=1 N 2
, (11)
reflect the intrinsic predictive value of a feature, features SSE i=1 (ŷi − ŷ)
having lower importance for an under-/overfitted model could
where ŷi∗ is the prediction for f , ŷi is the prediction for fb
be important for a better-fitted model [20]. We use SHAP to
for X ∗ , and SSE and SSE ∗ are the sum of square errors for
generate more consistent explanations. SHAP importance for
f and fb , respectively [6]. If f can be used instead of fb :
ai ∈ x is computed by comparing what f predicts with and 2
• if R is close to 1 (low error), the surrogate model
without ai for all possible combinations of M-1 features (i.e.,
except for ai ) w.r.t SV ϕi [20]. Since the order in which the approximates the behavior of the black-box model very
features are observed by a model impacts its outcome, SVs well. Hence, the surrogate model f can be used instead
explain the output of a function as the sum of effects ϕi of each of the black-box model fb .
2
• if R is close to 0 (high error), the surrogate fails to
feature being observed into a conditional expectation. If ai has
zero effect on the prediction, an SV of 0 is expected. If two approximate the black-box, hence cannot replace fb .
features contribute equally, their SVs would be the same [9]. A. Datasets
To compute GFI, absolute SVs per feature across all in- We experimented on four datasets: i) gene expression from
stances are averaged. Then, to generate consistent GFI, we Pan Cancer Atlas project, having 20,531 features and covering
create a stacking ensemble of SVs by averaging the marginal 33 tumour types, ii) indoor localization (UJIndoorLoc) [22]
outputs from DT, XGBoost, and RF models. We derive deci-
sion rules from a root-leaf path in a DT: starting at the root 10 Probability an instance satisfies an antecedent to belong to a class.
TABLE I: Performance of individual models data points14 on one side of the DB as belonging to one class
Model Dataset Precision Recall F1 MCC and all those on the other side as belonging to another class.
Gene expr. 0.7745 0.7637 0.7553 0.7367
UJIndoorLoc 0.8652 0.8543 0.8437 0.7741 C. Performance of surrogate models
M LPP CA
Health advice 0.8743 0.8679 0.8564 0.8067
Forest cover 0.7654 0.7547 0.7522 0.7126
The fidelity and confidence of the rule set on test sets
Gene expr. 0.8725 0.8623 0.8532 0.7851 are demonstrated in table II. The mean fidelity is shown in
XGBoost
UJIndoorLoc 0.8964 0.8931 0.8836 0.7959 percentage and the standard deviations (SDs) for 5 runs are
Health advice 0.9354 0.9301 0.9155 0.8211
Forest cover 0.8382 0.8265 0.8184 0.7963
reported as ±. Fidelity levels of 80%, 60% to 80%, and below
Gene expr. 0.9326 0.9276 0.9175 0.8221 60% are considered high, medium, and low, respectively.
TabNet
UJIndoorLoc 0.9217 0.9105 0.9072 0.8051 As of UJIndoorLoc, the XGBoost model achieved the high-
Health advice 0.9455 0.9317 0.9178 0.8235 est fidelity and confidence scores of 90.25% and 89.15%, with
Forest cover 0.8953 0.8879 0.8854 0.8057
Gene expr. 0.9525 0.9442 0.9325 0.8353 SDs of 1.38% and 1.57%. RF model performed moderately
SANCAE
UJIndoorLoc 0.9357 0.9269 0.9175 0.8233 well giving the second highest scores of 88.11% and 90.25%,
Health advice 0.9623 0.9538 0.9329 0.8451 with SDs of 1.21% and 1.38%. As of health advice, XGBoost
Forest cover 0.9112 0.9105 0.9023 0.8124
achieved the highest fidelity and confidence of 91.38% and
90.25%, with SDs of 1.65% and 1.42%, respectively. RF
having 523 variables, iii) health advice having 123 variables11 , model also performed moderately well giving the second
and ii) forest cover type dataset [23] having 54 variables. highest fidelity and confidence of 90.11% and 89.45%, with
B. Model performance analyses slightly lower SDs of 1.81% and 1.35%.
As of forest cover type, the XGBoost model achieved
We report the performance of each model w.r.t. increasing
the highest fidelity and confidence of 94.36% and 92.17%,
latent dimensions in fig. 3. When the dimension increases,
with the SDs of 1.35% and 1.34%. RF model performing
accuracy also increases and the inter-model difference reduces,
moderately well too, yielding the second highest fidelity
until a certain point where accuracy decreases again. In
and confidence of 93.15% and 91.25%, with slightly lower
the case of lower dimensional datasets (e.g., health advice
SDs of 1.42% and 1.31%. As of gene expression, XGBoost
and forest cover), accuracy improves up to 45 ≈ 55% of
model achieved highest fidelity and confidence of 93.45% and
projected dimension. However, embedding them into much
91.37%, with the SDs of 1.25% and 1.35%. RF model also
lower dimensions loses useful information to correctly classify
performed moderately well giving second highest scores of
data points, yielding significant accuracy drop. In the case of
92.25% and 90.21%, with slightly lower SDs of 1.35% and
higher dimensional datasets (e.g., GE, UJIndoorLoc), more di-
1.29%. The R2 for surrogates are reported in table III. The R2
mensions bring more noise than information, which makes the
for the XGBoost model is comparable to the best performing
classification harder (a model is no better than baseline, e.g.,
SANCAE as well as the T abN et model.
5% for GE and 9% for UJIndoorLoc). Projecting them into 5
≈ 7% embedding dimension unlikely to lose information. D. Global interpretability
M LPP CA asymptotically yields the lowest accuracy across Accurate identification of the most and least significant
datasets (table I), while XGBoostIsomap slightly outper- features helps understand their relevance w.r.t certain classes.
formed M LPP CA . As PCA features are projected onto an For example, biologically relevant genes provide insights into
orthogonal basis, they are linearly uncorrelated. PCA is similar carcinogenesis as they could be viewed as potential biomarkers
to a single-layered AE with a linear activation. Isomap learns for specific cancer types. However, providing global and local
a projection that preserves the intrinsic structure, but it fails explanations for all datasets will be overwhelming, so we
to learn complex mappings. SANCAE and T abN et yielded focus on the gene expression dataset. Therefore, both GFI
comparable accuracy as both models learn projections that and impacts are analysed to understand the model’s behaviour.
preserve relevant information for the classification. However, Common and important features (w.r.t GFI) identified with
SANCAE outperformed T abN et as CAE modelled non-linear SANCAE are identified, where GFI assign a score to input
interactions among a large number of features and generate features based on how useful they are at predicting a target
classification-friendly representations. We investigate the pre- class or overall classes. However, unlike feature impact, fea-
cision plot and lift curve in fig. 6 and fig. 7: while the former ture importance does not provide which features in a dataset
outlines the relation between predicted probability (that an have the greatest positive or negative effect on the outcomes.
index belongs to positive) and the percentage of an observed Therefore, global feature impacts sorted from most to
index in the positive class12 , the latter shows the percentage least important of SHAP value, are shown in fig. 4, for
of positive classes when observations with a score above the model SANCAE . SHAP gives slightly different views on fea-
cutoff are selected vs. random selection. Besides, we observe ture impacts: SPRR1B, ADCY3, FAM50B, SEMA3E, SLN,
the decision boundary (DB)13 in fig. 5. Each model classifies HAGLROS, CXCL10, VPS9D1-AS1, TRIM17, CLTRN,
11 https://github.com/itachi9604/healthcare-chatbot 12 The observations get APLP1, and CWH43 positively impact the prediction. It
binned together in groups of roughly equal predicted probabilities and the signifies if the prediction is in favour of a cancer type (e.g.,
percentage of positives is calculated for each bin. 13 Decision boundary is
a hyper-surface that partitions the feature space. 14 Shown 5 classes only as covering all 33 classes is overwhelming.
(a) Gene expression (b) Indoor localization

(c) Health advice (d) Forest cover type


Fig. 3: Mean accuracy w.r.t relative dimension of latent space across datasets. Shade indicates standard deviation. The
baseline is obtained by training the T abN et model on original feature space (i.e., 100% of the dimensions)

(a) Gene expression (b) Indoor localization

(c) Health advice (d) Forest cover type


Fig. 4: Global feature impacts sorted in terms of global feature impacts
TABLE II: Fidelity vs. confidence of rule sets for the surrogate models
DT RF XGBoost
Dataset Fidelity Confidence Fidelity Confidence Fidelity Confidence
UJIndoorLoc 86.16 ± 1.72 85.37 ± 1.53 89.27 ± 1.46 88.11 ± 1.21 90.25 ± 1.38 89.15 ± 1.57
Health advice 88.35 ± 1.45 87.55 ± 1.85 90.11 ± 1.81 89.45 ± 1.35 91.38 ± 1.65 90.25 ± 1.42
Forest cover type 90.23 ± 1.37 88.75 ± 1.32 93.15 ± 1.42 91.25 ± 1.31 94.36 ± 1.35 92.17 ± 1.34
Gene expression 91.27 ± 1.42 89.33 ± 1.25 92.25 ± 1.35 90.21 ± 1.29 93.45 ± 1.25 91.37 ± 1.35

(a) Gene expression (b) UJIndoorLoc

(c) Health advice (d) Forest cover type


Fig. 5: Decision boundaries for XGboost model across datasets for top-2 features

Fig. 6: Precision plot for the SANCAE model trained on GE dataset

COAD), these variables will play a crucial role in main- SULT4A1, EN1, EFNB1, and GABRP have negative impacts
taining this prediction. Conversely, TP53, CDS1, PCOLCE2, on the prediction. It means if the prediction is COAD and the
MGP, MTCO1P53, TFF3, AC026403-1, BRCA1, LAPTM5, value of these variables are increased, the final prediction is
Fig. 7: Lift curve for the SANCAE model trained on GE dataset

TABLE III: Percentage of variance (R2 ) of surrogates features not only reveal their relevance for this decision but
Dataset DT RF XGBoost also signify that removing them is like to impact the final
UJIndoorLoc 86.2 ± 1.7 89.3 ± 1.5 91.4 ± 1.5 prediction. Further, we focus on local explanations for this
Health advice 89.4 ± 1.5 92.1 ± 1.8 94.2 ± 1.7
Forest cover 90.3 ± 1.4 91.2 ± 1.4 94.3 ± 1.3
prediction by connecting decision rules and counterfactuals
Gene expression 88.3 ± 1.4 90.2 ± 1.3 93.3 ± 1.5 with additive feature attributions (AFA) in fig. 8. While An-
chor provides a single rule outlining which features impacted
at arriving this decision, LIME generates AFA stating which
likely to end up flipping to another cancer type. features had positive and negative impacts. However, using
decision rules and a set of counterfactuals, we show how the
E. Local interpretability classifier could arrive at the same decision in multiple ways
First, we randomly pick a sample from the test set. As- due to different negative or positive feature impacts.
suming XGBoost predicts the instance is of COAD cancer V. C ONCLUSION
type, the contribution plot (fig. 7 in supplementary) out-
In this paper, we proposed an efficient technique to improve
lines how much contribution individual features had on this
the interpretability of complex black-box models trained on
prediction. Features (genes) DNMT3A, SLC22A18, RB1,
high-dimensional datasets. Our model surrogation strategy is
CDKN18, MYB are top-k features w.r.t impact values, while
equivalent to the knowledge distillation process for creating a
features CASP8 and MAP2K4 had negative contributions.
simpler model. However, instead of training the student model
Further, to quantitatively validate the impact of top-k features
on teacher’s predictions, we transferred learned knowledge
and to assess feature-level relevances, we carry out what-
(e.g., top-k or globally most and least important features)
if analysis. As shown, the observation is of COAD with a
to a student and optimize an objective function. Further,
probability of 55% and BRCA type with a probability of
the more trainable parameters are in a black-box model,
29%. Features on the right side (i.e., TFAP2A, VPS9D1-
the bigger the size of a model would be. This makes the
AS1, MTND2P28, ADCY3, and FOXP4 are positive for
deployment infeasible for such a large model in resource-
COAD class, where feature TFAP2A has the highest positive
constrained devices15 . Further, the inferencing time of large
impact of 0.29) positively impact the prediction, while fea-
models increases and ends up with poor response times due to
tures on the left negatively. Genes TFAP2A, VPS9D1-AS1,
network latency even when deployed in a cloud infrastructure,
MTND2P28, ADCY3, FOXP4, GPRIN1, EFNB1, FABP4,
which is unacceptable in many real-time applications. We hope
MGP, AC020916-1, CDC7, CHADL, RPL10P6, OASL, and
our model surrogation strategy would help create simpler and
PRSS16 are most sensitive to making changes, while features
lighter models and improve interpretability in such a situation.
SEMA4C, CWH43, HAGLROS, SEMA3E, and IVL are less
Depending on the complexity of the modelling tasks, a
sensitive to making changes.
surrogate model may not be able to fully capture a complex
If we remove feature TFAP2A from the profile, we would black-box model. Consequently, it may lead users to recom-
expect the model to predict the observation of COAD cancer mend wrong conclusions (e.g., in healthcare) – especially if
type with a probability of 26% (i.e., 55% − 29%). This will the knowledge distillation process is not properly evaluated
recourse the actual prediction to BRCA in which features IVL, and validated. In the future, we want to focus on other model
PRSS16, EFNB1, and CWH43 are most important, having
impacts of 0.23, 0.17, 0.123, and 0.07, respectively. These 15 e.g., IoT devices having limited memory and low computing power.
Fig. 8: Example of explaining single prediction using rules, counterfactuals, and additive feature attributions

compression techniques such as quantization (i.e., reducing [11] A. Chattopadhay and A. Sarkar, “Grad-CAM++: Generalized gradient-
numerical precision of model parameters or weights) and based visual explanations for deep convolutional networks,” in Conf. on
Applications of Computer Vision(WACV). IEEE, 2018, pp. 839–847.
pruning (e.g., removing less important parameters or weights). [12] S. Bach, A. Binder, G. Montavon, K.-R. Müller, and W. Samek, “On
pixel-wise explanations for non-linear classifier decisions by layer-wise
ACKNOWLEDGEMENT relevance propagation,” PloS one, vol. 10, no. 7, 2015.
[13] B. Škrlj, S. Džeroski, N. Lavrač, and M. Petkovič, “Feature im-
This paper is a collaborative effort and based on the PhD portance estimation with self-attention networks,” arXiv preprint
arXiv:2002.04464, 2020.
thesis [17] by the first author and the second author’s work [14] S. O. Arık and T. Pfister, “TabNet: Attentive interpretable tabular
as part of the Marie Skłodowska-Curie project funded by the learning,” in AAAI, vol. 35, 2021, pp. 6679–6687.
Horizon Europe 2020 research and innovation program of the [15] M. T. Ribeiro, S. Singh, and C. Guestrin, “Anchors: High-precision
model-agnostic explanations,” in Thirty-Second AAAI Conference on
European Union under the grant agreement no. 955422. Artificial Intelligence, 2018.
[16] R. Guidotti, A. Monreale, S. Ruggieri, D. Pedreschi, F. Turini, and
R EFERENCES F. Giannotti, “Local rule-based explanations of black box decision
systems,” arXiv preprint arXiv:1805.10820, 2018.
[1] Q. Fournier and D. Aloise, “Empirical comparison between autoencoders [17] M. R. Karim, D. Rebholz-Schuhmann, and S. Decker, “Interpreting
and traditional dimensionality reduction methods,” in 2019 IEEE Sec- black-box machine learning models with decision rules and knowledge
ond International Conference on Artificial Intelligence and Knowledge graph reasoning,” Aachen, Germany, June 2022. [Online]. Available:
Engineering (AIKE). IEEE, 2019, pp. 211–214. https://publications.rwth-aachen.de/record/850613
[2] C. C. Aggarwal and C. K. Reddy, Data clustering: algorithms and [18] M. D. Zeiler, G. W. Taylor, and R. Fergus, “Adaptive deconvolutional
applications. CRC press, 2014. networks for mid and high level feature learning,” in 2011 international
[3] M. R. Karim, T. Islam, M. Cochez, D. Rebholz-Schuhmann, and conference on computer vision. IEEE, 2011, pp. 2018–2025.
S. Decker, “Explainable AI for Bioinformatics: Methods, Tools, and [19] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez,
Applications,” Briefings in Bioinformatics, 2023. Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in
[4] M. E. Kaminski, “The right to explanation, explained,” Berkeley Tech. neural information processing systems, vol. 30, pp. 5998–6008, 2017.
LJ, vol. 34, p. 189, 2019. [20] S. M. Lundberg and S.-I. Lee, “Consistent feature attribution for tree
[5] S. Wachter, B. Mittelstadt, and C. Russell, “Counterfactual explanations ensembles,” arXiv preprint arXiv:1706.06060, 2017.
without opening the black box: Automated decisions and the GDPR,” [21] R. M. Grath, L. Costabello, C. L. Van, P. Sweeney, F. Kamiab,
Harv. JL & Tech., vol. 31, p. 841, 2017. Z. Shen, and F. Lecue, “Interpretable credit application predictions with
[6] C. Molnar, Interpretable machine learning. Lulu. com, 2020. counterfactual explanations,” arXiv preprint arXiv:1811.05245, 2018.
[7] M. Ribeiro, S. Singh, and C. Guestrin, “Local interpretable model- [22] J. Torres-Sospedra, R. Montoliu, A. Martı́nez-Usó, T. J. Arnau,
agnostic explanations (LIME): An introduction,” 2019. M. Benedito-Bordonau, and J. Huerta, “Ujiindoorloc: A new multi-
[8] H. Lakkaraju, E. Kamar, R. Caruana, and J. Leskovec, “Faithful and building and multi-floor database for wlan fingerprint-based indoor
customizable explanations of black box models,” in Proceedings of localization problems,” in 2014 international conference on indoor
AAAI/ACM Conference on AI, Ethics, and Society, 2019, pp. 131–138. positioning and indoor navigation (IPIN). IEEE, 2014, pp. 261–270.
[9] S. M. Lundberg and S.-I. Lee, “A unified approach to interpreting model [23] J. A. Blackard and D. J. Dean, “Comparative accuracies of artificial neu-
predictions,” in Advances in neural information processing systems, ral networks and discriminant analysis in predicting forest cover types
2017, pp. 4765–4774. from cartographic variables,” Computers and electronics in agriculture,
[10] T. Miller, “Explanation in artificial intelligence: Insights from the social vol. 24, no. 3, pp. 131–151, 1999.
sciences,” Artificial Intelligence, 2018.

You might also like