Using Explanations in Social Science to Evaluate 3 Explainable AI Algorithms — LIME, Counterfactual Visual Explanation and ACE

20 min readDec 24, 2020

Explainable AI algorithms explain ML prediction / model to a human so as to gain human trust. On the subject of explanation to human, the social science domain has conducted extensive research on what makes a complete explanation, and how can the explanation be crafted so that it can be understood by humans.

This post is a summary of one of my research work that builds on the research findings in the social sciences to develop a framework that can be used to evaluate explainable AI algorithms. The framework is used to evaluate three state-of-the-art explainable AI algorithms (LIME, Counterfactual Visual Explanation, and Automatic Concept-Based Explanation) for image recognition ML models.

Comparison of LIME, Counterfactual Visual Explanation and Automatic Concept-Based Explanation (ACE) against framework to evaluate explainable AI

The structure of this post is as such:

Section 1 (This section): An introduction to this post
Sections 2–5: Brief primer on the major findings in social science, and how these findings could be consolidated to become a set of factors that explainable AI algorithms can be evaluated against
Section 6: LIME (Local Interpretable Model-Agnostic Explanation) — Implementation Details, Results and Evaluation
Section 7: Counterfactual Visual Explanation — Implementation Details, Results and Evaluation
Section 8: ACE (Automatic Concept-based Explanation) — Implementation Details, Results and Evaluation
Section 9: Conclusion with summary of findings

WHY do we need explanation

Machine learning (ML) is widely adopted in various fields such as autonomous driving, medicine and cybersecurity, where safety and security are of paramount importance. For humans to deploy ML systems for safety and security essential application, and to trust the decisions made by the ML systems, human have a right for explanation from the ML systems.

In the field of social science, researchers had for decades look into frameworks of explanations to understand what constitutes a good explanation — to human. In looking into how ML decisions and models can be explained to human, the models and frameworks developed in the field of social science should be taken into consideration. Otherwise, approaching explainable AI purely from a ML perspective would be similar to the phenomenon of “the inmates running the asylum’’.

Two broad approaches to interpret ML models are identified: (a) perform interpretation on transparent model, or (b) perform post-hoc interpretation of opaque ML model. The former approach gives insights to the internals of the ML model to explain its functions. However, revealing the internals of the model results in several considerations, such as Intellectual Property (IP) and privacy concerns. The post-hoc approach reveals how the model behaves, without revealing how it functions.

Since human is the subject of explainable AI, the effectiveness of explainable AI algorithm should be evaluated against factors suggested in social science. This post summarizes the findings in social science to determine the factors that are essential to explainable AI, and evaluates three proposed post-hoc state-of-the-art explainable AI algorithms against the factors.

HOW to Explain

T. Miller’s “Explanation in Artificial Intelligence: Insights from the Social Sciences” presented three major findings with regards to explanation. These findings will be illustrated using the fictitious conversation between a student and his teacher below.

Conversation between student and teacher to illustrate the “How” to explanation

Explanations are Contrastive

Explanations are provided in relative to an implicit/explicit counterfactual case that did not occur. In the figure above, when the teacher replied “because he is wearing a police uniform’’, there was an implicit assumption that an explanation was requested to distinguish police from other occupations. When she replied ‘’he has short, trim hair’’, she was explaining that the person is male, as oppose to the counterfactual case where she is female.

Contrastive explanations are more valuable and intuitive to a human, and reduce the cognitive burden that an otherwise complete explanation would bring about — just like how we would always skip the TnCs of every software upgrade. In the area of explainable AI, counterfactual explanation would be contrastive in nature and would be better received by the human receiving the explanation.

Explanations are Selected

Explanations provided are a subset from a possibly infinite set of explanations, based on a certain set of cognitive biases. In the figure above, the teacher could have (but should not) provided a definition of a policeman in response to the question from the student, or to describe every detail observed from the picture.

Non-selected explanations would not be intuitive to a human, and would also dilute the more important details that would otherwise be crucial in the explanation. In selecting the explanations, it is to be noted that abnormal or unexpected observations (from the perspective of the human subject) tend to invite questions from human. For explainable AI, explanations should be provided taking into account the cognitive load that would be required, and that which could be handled by the human, for the explanations.

Explanations are Interactive

Explanations are provided as part of a social conversation / interaction. In the figure above, the student is finally satisfied with the explanation after several interaction. If the teacher was to shut the conversation off after her first reply, the student would not be satisfied with the explanation.

One of the challenges of contrastive explanation is number of implicit counterfactual that could be assumed in the question (e.g. police-MAN vs police-WOMAN). Multiple interaction between the explainer and the explainee eliminate the wrong assumptions to converge towards most salient contrastive explanation that would satisfy the human. It is through interaction that inconsistencies in the knowledge structures can be reconciled. In the area of explainable AI, an interactive interface for explanation would be preferred.

WHAT to Explain

Aristotle Four Causes model provides a framework to provide a complete explanation to a “why’’ question. The four causes are summarized in the table below, and their relevance to Explainable AI.

Relevance of Aristotle Four Causes model to Explainable AI

The conversation between the teacher and student in the earlier figure illustrates some of the “what” explanations framed by Aristotle Four Causes model:

Material Cause: The color and design of the uniform provides an explanation that the person is a police.

Formal Cause: The gun carried by the person explains that the person is a police. The short, trim hair are properties of a male, to explain that the person is a man.

Efficient Cause: The presence of another person who is apprehended by the person in uniform explain that the uniformed person is a police.

Final Cause: Police arrest people, thus the act of the uniformed person arresting the other person explains that the former is a police.

WHAT, HOW and WHERE Framework for Explainable AI Algorithms

The “what’’ and “how’’ yields a set of factors, detailed in figure below, that can be used to evaluate the explainable AI algorithms.

While the “what’’ factors ensure that explanation that can be generated are complete and explanations generated are faithful to the ML model that is being explained; the “how’’ factors ensure that explanations generated are interpretable to the human.

Also included in the figure is the “where’’ factor to differentiate algorithms that provide local (i.e. individual prediction) explanation against those that are global (i.e. the entire model).

These factors will be used to evaluate the three algorithms (LIME, Counterfactual Visual Explanation, and Automatic Concept-Based Explanation) detailed in the subsequent sections.

Factors to evaluate Explainable AI Algorithms

Local Interpretable Model-Agnostic Explanation — LIME

The authors of LIME asserts that trust in ML behaviour is essential as it is eventually the human who will decide whether to (a) take action based on the prediction of the ML model, and (b) decide to deploy the ML model for real-world application.

To address the trust in ML prediction, Local Interpretable Model-Agnostic Explanation (LIME) uses an interpretable model representation to explain predictions from any ML model in a locally faithful manner is proposed. The explanations are contrastive in nature, where the presence/absence of regions of an image is used to explain the ML prediction.

LIME also proposed Sub-modular Pick LIME (SP-LIME) to provide a global explanation for the ML model. SP-LIME selects and explains (using LIME) representative individual predictions in a non-redundant manner to provide global understanding of the ML model.

LIME Implementation

LIME uses an interpretable model representation to explain predictions from any ML model in a locally faithful manner.

The equation used to derive the explanation from LIME is provided herein.

For an instance x that have been predicted by the ML model, the LIME equation aims to find a model for explanation g, from the set of all possible explanations G, to explain the prediction, with two considerations:

To ensure local fidelity by minimizing the loss: L, between the model for explanation: g, and the model to be explained: f, in the locality of the instance x, denoted by pi, and
To ensure interpretability of the explanation by minimizing the complexity, denoted by Omega, of the model for explanation — g

The paper proposed Algorithm 1, replicated below, to generate g (which is fixed as sparse linear explanation) to explain the prediction. A detailed illustration is provided in the subsequent figure to demonstrate how Algorithm 1 is applied for a image classification example. Step-by-step explanation of each step is also provided.

Algorithm 1 to generate Sparse Linear Explanation using LIME

Illustration of Algorithm 1 on an image classification example

An input instance x_1 (image of labrador playing a guitar) that gives a output label y_1 = f(x_1) = “is Labrador’’, is the instance that needs to explained.
x_1 is transformed to its interpretable data representation x_1', where d’ is the set of superpixels, and the domain of x_1' is the presence/absence of each set of super-pixels.
x_1' is randomly (number and index of elements) perturbed to produce z_1', z_2' and z_3'. Note that z_1', z_2' and z_3' has varying number and index of superpixels.
The perturbed interpretable data representation is transformed back to its input features domain, z_1, z_2 and z_3. pi_x (z) is calculated by pi_x = exp(-D(x, z)² / sigma²), where D is the cosine distance between x and z, while sigma is the kernel width, default to 25. z_1, z_2 and z_3 are used as input to the original classification model to generate their respective output labels: f(z_1), f(z_2) and f(z_3).
z_1', z_2' and z_3' are also used as input to the model used for explanation: g, to produce the output labels: g(z_1'), g(z_2') and g(z_3). For the paper, G is the class of sparse linear models, thus g(z’) = w_g(z’).
The locally weighted (i.e. weighted by pi_x (z)) squared loss between the output label from steps 4 and 5 is calculated.

Steps 1 to 4 above addresses part A of Algorithm 1. Steps 5 and 6 are sub-set of Part B of Algorithm 1 which is performing model fitting on a sparse linear model, g, using Z’ as input and f(z) as output, subjected to the condition of Omega(g) to limit the number of internal data representation (i.e. size of d’) to be used for the explanation.

As solving the LIME equation is intractable, the paper solves for g(z’) in a two steps approach:

Using ridge regression to select the first K interpretable data representation, followed by
Using ridge regression again to learn the weights of the spare linear model, g, using the K interpretable data representation found in earlier step

The example in above illustrates the use of super-pixels as interpretable data representation to explain a prediction from a ML model trained for image classification. Bags of words also can be used as the interpretable data representation to explain a prediction from a ML model trained for text classification.

Sub-Modular Pick LIME — SP-LIME

While LIME provided explanation for an individual prediction, it does not provide a global explanation for the ML model. The paper also proposed SP-LIME that selects and explains representative individual predictions in a non-redundant manner to provide global understanding of the model — for the case of text classification.

However, it is to be noted that the pick-step (i.e. SP-LIME) for images is currently not provided by the authors.

Results of LIME

The authors of LIME performed user trials on 27 test subjects to determine if LIME produced explanation that would provide the users with insights whether the ML model could be trusted or not.

In an attempt to simulate a ML model overfitting to irrelevant concepts from the image, a logistic regression classifier was trained with 20 images of wolves and husky. All wolf images in the 20 training images had snow in the background while those of husky did not. Verification check on a separate set of 60 images were performed to confirm that the classifier predicts all images with snow as wolves, and those without as husky. A test set of 10 images, where 8 of the images were classified correctly, while the other 2 were wrongly classified — an image of wolf without snow background was classified as husky, while that of husky with snow background was classified as wolf (see bottom left image of the figure below).

Experiment of Husky Vs Wolf Image Classification

The results of the user trial is reproduced below.

Even though the ML model was overfitting, 10 of the 27 participants trusted the bad model initially. Only upon presented with the explanation from LIME (see set of super-pixels presented in in the earlier figure), did the number of participants who trusted the model dropped to 3.

Before LIME explanation, only 12 participants were able to predict that the model was erroneously using snow as a potential feature for prediction. After the explanation, the number increased to 25.

Evaluation of LIME against proposed framework

The factors identified in the earlier section of the post is used to evaluate the explanation produced by LIME.

Evaluation of Explanation produced by LIME

Counterfactual Visual Explanation

The authors of counterfactual visual explanation asserts that gaining insights to the decisions of ML models facilitates the design of better ML models, evaluation of fairness of the models and establishment of trust on humans.

Counterfactual visual explanation proposed the use of counterfactual visual explanation where regions of the input image are identified to swap with that of another image so as to produce the counterfactual explanation to explain the model prediction. The explanations are contrastive in nature, where the swapping of regions of an image is used to explain the ML prediction.

Implementation of Counterfactual Visual Explanation

The figure above shows a ML model trained for classification. To the input query image, I, the ML model correctly predict the output class c. A second input image, termed as the distractor image, I’, is correctly predicted as output class c’.

Counterfactual visual explanation provides explanations for the ML model prediction by providing insights on how spatial regions from the distractor image, I’, can be swapped with that of the query image, I, such that the ML model incorrectly classify the composite image, I* — which is largely still the query image, with minimal spatial regions from the composite image, with the label of the distractor image.

A comparison of the distractor image that flipped the classification output, with the query image shows the regions of the query image that were changed for classification to change. Contrastive explanation is provided in this way.

For the purpose of explanation and without loss of generality, counterfactual visual explanation decomposed the ML model to be explained into its spatial feature extractor, f, and decision network, g, as illustrated in the next figure. The output of spatial feature extractor, f, has dimension h x w x d — with h and w the spatial dimension, and d the feature size — which is then reshaped to dimension hw x d. The decision network, g takes the output of f as input to give to log-probabilities over the output class, Y.

Decomposition of ML model to spatial feature extractor (f) and decision network (g)

Minimum-edit Counterfactual Problem

Formally, counterfactual visual explanation is performed using the equation below, which need to be solved as the minimum-edit counterfactual problem (see subsequent figure). The objective of the minimum-edit counterfactual problem is to find the lowest number of edits, measured by the L1 norm of the gating vector a, such that the spatial features of the composite image is wrongly classified as the label of the distractor image.

Equation to perform Counterfactual Visual Explanation

The next figure provides an illustration to the generation of the composite image, I*, to solve the minimum-edit counterfactual problem, with details below:

Permutation Matrix P is applied to the distractor image spatial feature maps f(I’) to get a set of rearranged distractor features Pf(I’).
For k edits (i.e. swaps performed in the spatial feature map) that need to be performed, a total of hw^(k+1) computation would be required. The figure belows shows how the edits are performed based on the values in the permutation matrix.
After obtaining the rearranged distractor features, the gating vector a — with domain 1 or 0 — determines the source of each of the hw spatial features of the composite image. The hw spatial features could be from the rearranged distractor features (if a = 1) or the original query image (if a = 0).

Illustration of getting the composite image for solving minimum-edit counterfactual problem

The computational complexity of the minimum-edit counterfactual problem is O((hw)^{2+k}), which is exponential to the number of edits required. The authors of counterfactual visual explanation then proposed improvements to solve the problem with a lower computational complexity.

Greedy Sequential Exhaustive Search

Greedy Sequential Search is proposed by the authors to reduce the computational complexity to O(h²w²k). Greedy sequential search can be summarized as follow:

Swap a cell of the spatial feature map for the query image with that of the distractor image.
Calculate log-probability of classifying spatial feature map of composite image as the label of the distractor image.
Repeat steps 1 and 2 for all permutations (hw x hw) of cells.
Repeat steps 1 to 4, excluding the already swapped cells, until the spatial feature map of the composite image is classified as the distractor image.

Continuous Relaxation

Lastly, the authors loosen the restrictions of the gating vector a and the permutation matrix P to formulate the equation in below, which could then be solved as an optimization problem using gradient descent.

In Continous Relaxation, the gating vector, a is relaxed to be non-negative and sums to one; while the permutation matrix, P, to be non-negative with rows P_i^T summing to one. a and P are reparameterized using a softmax function to ensure that ‘sum to one’ property. The parameters alpha and M are then optimized by gradient descent.

Results

User trial was performed by on 26 students on a task to identify bird species. The users were trained to identify bird species by choosing the name of the bird upon viewing a picture of it. When the answer provided by the user is wrong, feedback would be provided to indicate the wrong answer. The feedback for the wrong answer could come with counterfactual explanation (i.e. feedback explains which regions of the image should be noted to identify the difference), or without which. After the training, the users were tested to identify the names of the birds. Test accuracy achieved by the students with and without counterfactual explanations were reported to be 78.77% and 71.09% respectively. The training, feedback and test interface is reproduced below.

Counterfactual Visual Explanation User Trial Interface

Evaluation

The factors identified in earlier section in the post is used to evaluate the explanation produced by counterfactual visual explanation.

Automatic Concept-based Explanation (ACE)

The authors of ACE referenced several research findings that indicated that perturbation based explanations are unreliable, vulnerable to input shift and susceptible to human confirmation bias.

As a consequence, ACE is proposed that uses high-level human concepts, instead of features / pixels, for explanation. ACE automatically extract visual concepts as explanation that are meaningful, coherent and salient to the human.

Implementation of ACE

ACE is implemented by the following three steps, summarized in the figure below.

Multi-Resolution Segmentation of Images: ACE performs segmentation of images that belong to the same class. Three levels of segmentation (i.e. different level of granuity) is performed on the images to extract the different level of concepts (texture, objects parts and objects) required for subsequent processing. Simple Linear Iterative Clustering (SLIC) is used for the multi-resolution segmentation due to its low computation complexity — O(N), where N is the total number of pixels in the image.
Clustering Similar Segments + Remove Outliers: The research findings from R. Zhang suggested that the final layers of state-of-the-art convolutional neural networks that have been trained on large-scale data set were able to detect similar concepts that would have been identified by humans. ACE uses these findings to cluster similar segments that relate to a consistent concept.
Computing Saliency of Concepts: The saliency of the concepts are computed based on the findings of B. Kim so that the most salient concepts are provided as explanation.

Multi-Resolution Segmentation of Images

Super Linear Iterative Clustering (SLIC) uses a expectation-maximization process to perform segmentation to generate superpixel. The algorithm is reproduced below with an illustrated example in the subsequent figure.

For the picture of dimension 81 (9 pixel width x 9 pixel height) with green triangle defined in CIELAB, SLIC is to be performed.
If k is set to 9 (i.e. Superpixel segmentation to generate total of 9 superpixels), S would be calculated to be sqrt(81/9) = 3, thus the 3-by-3 grid representing the separation of the initial cluster centers.
The cluster centers are selected at the lowest gradient point (i.e. not edge of object or noisy pixel). The first 2 cluster centers, k_0 and k_1 are selected as shown in the 2nd picture of above.
For each cluster center, it checks the pixels within 2S x 2S region around it. See the red and green square of the 2nd picture for the region checked by cluster centers k_0 and k_1 respectively.
Assignment: SLIC then assign the pixels to the cluster center that is nearest to it (by color spectrum, based on LAB — see d_c, and spatial proximity — see d_s). The red and green shaded regions in the 3rd picture shows the results of the assignment step.
Update: The cluster centers are recomputed based on the pixels assigned to each cluster center. The last picture shows the recalculated cluster centers. Upon the next assignment step will the entire green triangle be segmented properly. .

ACE used SLIC to perform multi-resolution segmentation (with number of cluster centers, k = 15, 50 and 80) for each image. Different levels of concepts would be detected for the next step for clustering.

Clustering Similar Segments + Remove Outliers

R. Zhang and authors performed large amount of user trials on a two alternative forced choice (2AFC) test, where to a reference image x, two different distortions were applied to it to produce Patch 0, x_0, and Patch 1, x_1. Users were then requested to select, from x_0 and x_1, the patch that is the most similar to the reference image. Several examples of the reference image and distorted patches are provided below.

The set of distortion investigated by R. Zhang and authors is summarized below.

The human trial 2AFC test results were then checked against the similarity calculated by various algorithm, including the cosine distance of the deep features of deep neural network trained on large scale dataset. The results are reproduced in in the figure below, which shows that deep features of neural network measures similarity of concepts almost as how a human would.

ACE built upon the findings discussed above and used the euclidean distance of the “mixed_8” layer of the Inception-V3 architecture, trained on ImageNet, to measure the similarity between concepts identified by SLIC. K-means clustering is then performed to remove outliers based on their extremely large euclidean distance from the cluster centers.

Computing Saliency of Concepts

Testing with Concept Activation Vectors (TCAV) is used by ACE to compute the saliency of the identified concepts so as to generate an importance ranking for each identified concept for the respective output classes.

The illustration in the figure below is used together with the methods identified earlier to explain how TCAV can be used to generate the saliency of each concept.

In TCAV, set of examples representing the concept (see strips in (a)) and random examples that do not represent the concept (see random examples in (a)) are fed as input to a network, with labelled training-data examples for the studied class (see zebras in (b)). For ACE, the concepts are the activations at the output of “mixed_8'’ of the Inception-V3 architecture (similar to the output of f_l in (c)), while the examples that represent the concept are the input images that generate high activations for the high-level concepts.

For each concept identified, a binary linear classifier is trained to distinguish the concept from the non-concepts (see (d)). The Concept Activation Vector (CAV) is identified as the vector that is orthogonal to the boundary for the binary classification (see vector v_c^l in (d)).

For each class k, and each identified concept C, the conceptual sensitivity used to calculate the TCAV is used to determine the importance of each concept C for each class k.

Results of ACE

Some of the concepts identified by ACE on the Inception-V3 trained on the ImageNet dataset are provided below.

User trials were conducted by the authors to quantify the effectiveness of ACE. In the user trials, 6 pictures, each containing an image that was conceptually different from the other 5, were presented to the user. The user’s task was to identify the image that is conceptually different from the rest of the other 5 images. The results of the user trials show that for a total of 30 experiments — each with 6 pictures, the users were able to identify the conceptually different picture from 97% (14.6 out of 15) and 99% (14.9 out of 15) of the experiments with hand-labelled and ACE labelled dataset respectively.

The authors of ACE further looked into smallest sufficient concepts (SSC) — smallest set of concepts that when inserted are sufficient to predict the target class; and smallest destroying concepts (SDC) — smallest set of concepts that when removed are sufficient to make a prediction incorrect. The results are reproduced in below.

Prediction Accuracy with Insertion / Deletion of SSC / SDC

The results of the SSC and SDC provide contrastive explanation for a prediction (i.e. local prediction, not global model explanation).

Evaluation of ACE

The factors identified in earlier section of the post is used to evaluate the explanation produced by ACE.

Evaluation of Explanation produced by ACE

Conclusion

The performance of the three explainable AI algorithms is summarized in below.

Performance of Selected Explainable AI Algorithms

In terms of interpretability (the “how’’) of explanations, the figure above shows that all three reviewed papers were able to provide contrastive and selective explanation which would be easily interpretable by humans. However, none of them was interactive. This could be the future direction for explainable AI algorithms as interaction between the algorithm and user would remove any wrong implicit assumptions that the algorithm had of the user so that the explanation could be directed to address the users’ queries.

In the aspect of fidelity (the “what’’) of explanation, ACE performed better than the other algorithms as the identified high level concepts were able to provide explanation from the perspective of material, formal and efficient cause. It is however noted that none of the algorithms identified were advanced enough to provide an explanation based on the final cause — infer explanation through the definition of the purpose / final form of the item. This could require a heterogeneous explainable AI structure to capture the purpose / goal of each detectable object.

While global explanation would be preferred, none of the identified algorithms were able to provide a global explanation of the model. On the continuum of local-global explanation, ACE fared better as it was able to provide explanation on a class level, instead of individual prediction level.

This post builds upon the well-researched aspect of explanation in social science to inferred the framework of factors that could be used to evaluate existing explainable AI algorithms. The framework can also provide indication of the direction forward for explainable AI algorithms.