AI Systems Landscape

Explainable AI (XAI) — Interactive Architecture Chart

A comprehensive interactive exploration of Explainable AI — the explanation pipeline, 8-layer stack, explanation methods, SHAP/LIME, mechanistic interpretability, benchmarks, market data, and more.

~56 min read · Interactive Reference

Hameem M Mahdi, B.S.C.S., M.S.E., Ph.D. · 2026

Senior Principal Applied Scientist | Private Equity Leader | AI Innovative Solutions

📄 Forthcoming Paper

The Explanation Pipeline

Explainable AI follows a five-step pipeline from raw input through to human-readable explanations. Click each step to learn more.

INPUT
Features / Data
TRAINED MODEL
Black Box
RAW OUTPUT
Prediction
EXPLANATION ENGINE
SHAP, LIME, Grad-CAM
HUMAN-READABLE
Explanation

Click a step above

Explore how data flows through the explanation pipeline — from raw features to final human-understandable explanations.

Did You Know?

1

SHAP values are grounded in cooperative game theory — specifically Shapley values from 1953.

2

The EU AI Act (2024) mandates explainability for all high-risk AI systems deployed in Europe.

3

LIME (Local Interpretable Model-agnostic Explanations) can explain any black-box model's individual predictions.

Knowledge Check

Test your understanding — select the best answer for each question.

Q1. What do SHAP values measure?

Q2. What is the difference between global and local explainability?

Q3. Which regulation mandates AI explainability for high-risk systems?

The 8-Layer XAI Stack

Explainable AI is organised into eight architectural layers. Click any layer to expand details.

XAI Sub-Types

Eleven principal sub-types of explainability, organised by Scope, Format, and Access level.

Core Explanation Methods

Five families of explanation methods spanning inherently interpretable models to mechanistic interpretability.

XAI Tools & Frameworks

The definitive toolkit for building explainable AI systems — from SHAP to cloud-native platforms.

ToolCreatorKey Capabilities
SHAPLundbergShapley values; KernelSHAP, TreeSHAP, DeepSHAP; theoretically grounded
LIMERibeiro et al.Local perturbation-based; tabular, text, image
CaptumMeta / PyTorchComprehensive attribution; integrated gradients, SHAP, LRP
InterpretMLMicrosoftEBM + dashboard; glass-box models
AlibiSeldonCounterfactual explanations, anchors, trust scores
AIX360IBMPrototypes, contrastive explanations, boolean rules
What-If ToolGoogleInteractive visual; TensorBoard; fairness
Responsible AI ToolboxMicrosoftError analysis, fairness, interpretation
SageMaker ClarifyAWSBias detection, SHAP-based attributions
Azure Responsible AIMicrosoftInterpretability, fairness, error analysis

Use Cases

Where Explainable AI delivers real-world value — from regulated finance to life-critical healthcare.

Benchmarks & Evaluation

Quantitative measures of explanation quality and impact on human decision-making.

Explanation Quality Metrics

User Impact

Market Data

XAI market size, adoption metrics, and projected growth through 2028.

XAI Market Snapshot (2024)

Market Growth 2024 → 2028 (CAGR 21%)

Risks & Challenges

Key risks and open challenges in the Explainable AI ecosystem.

Glossary

Key terms and concepts in Explainable AI.

Visual Infographics

Animation infographics for Explainable AI (XAI) — overview and full technology stack.

Regulation

Detailed reference content for regulation.

Regulation & Governance

Regulatory Landscape

Regulation XAI Relevance
EU AI Act (2024) High-risk AI systems must provide "sufficient transparency to enable users to interpret the system's output and use it appropriately"; requires technical documentation of model behaviour
GDPR (2018) — Article 22 Data subjects have the right not to be subject to solely automated decisions; organisations must provide "meaningful information about the logic involved"
ECOA / Reg B (US) Creditors must provide "specific reasons" for adverse credit actions — directly requires explainable credit models
SR 11-7 (US Fed) Model Risk Management guidance — requires model documentation, validation, and explanation
FDA AI/ML Guidance (US) AI in medical devices requires transparency about model behaviour and decision-making
EU MDR (Medical Devices) Clinical decision support software must be transparent and understandable
UK FCA / PRA Financial regulators require firms to explain algorithmic decisions and demonstrate model governance
Singapore MAS FEAT Fairness, Ethics, Accountability, and Transparency framework for AI in financial services

Governance Best Practices

Practice Description
Model Cards Structured documentation of model purpose, performance, limitations, and intended use (Mitchell et al., 2019)
Datasheets for Datasets Documentation of dataset provenance, composition, collection process, and intended use (Gebru et al., 2021)
Explanation Logging Store explanations alongside predictions for audit and compliance
Explanation Review Process Human review of explanations for high-stakes decisions
Regular Explanation Audits Periodic assessment of explanation fidelity, stability, and alignment with domain knowledge
Fairness-Explainability Integration Use XAI to identify and mitigate bias; SHAP for protected attribute analysis
Tiered Explanations Provide different explanation depths for different audiences (consumer, business, technical, regulatory)

Deep Dives

Detailed reference content for deep dives.

Model-Agnostic Explanation Methods — Deep Dive

SHAP (SHapley Additive exPlanations)

Aspect Detail
Foundation Shapley values from cooperative game theory (Shapley, 1953); applied to ML by Lundberg & Lee (2017)
Core Idea Each feature is a "player" in a cooperative game; the prediction is the "payout"; Shapley values assign each player a fair contribution
Mathematical Property The only method satisfying local accuracy, missingness, and consistency — axiomatically the "fairest" attribution
Variants KernelSHAP: model-agnostic, perturbation-based — TreeSHAP: exact for tree models (O(TLD²)) — DeepSHAP: DeepLIFT + Shapley for neural nets — FastSHAP: amortised SHAP via a learned explainer network
Global Explanations Aggregate local SHAP values across the dataset: mean absolute SHAP = global feature importance
Interaction Values SHAP interaction values decompose Shapley values into main and interaction effects
Strengths Theoretically grounded; consistent; both local and global; works on any model
Limitations Computationally expensive for large models (exponential in exact form; KernelSHAP is approximate); baseline choice matters; can be slow for real-time applications

LIME (Local Interpretable Model-Agnostic Explanations)

Aspect Detail
Introduced Ribeiro et al. (2016)
Core Idea For a given prediction, perturb the input to generate a local neighbourhood; train a simple interpretable model (linear, decision tree) on the perturbation-prediction pairs; the interpretable model explains the local behaviour
Process 1. Select instance → 2. Perturb input → 3. Get model predictions for perturbations → 4. Weight by proximity → 5. Fit sparse linear model → 6. Report coefficients as explanations
Strengths Model-agnostic; intuitive; flexible — works on tabular, text, and image data
Limitations Explanations can be unstable (different runs → different explanations); fidelity to the original model varies; neighbourhood definition is subjective

Counterfactual Explanations

Aspect Detail
Core Idea "What is the smallest change to the input that would result in a different prediction?"
Example "Your loan was denied. If your income were £5,000 higher and you had no outstanding debts, the loan would have been approved."
Strengths Actionable — tells users what to change; intuitive; does not require model internals
Limitations Multiple valid counterfactuals may exist; some changes may be infeasible (cannot change age); need to constrain to plausible changes
Methods Wachter et al. (2017); DiCE (Diverse Counterfactual Explanations, Microsoft); FACE (Feasible Actionable Counterfactual Explanations)
Legal Relevance GDPR's Right to Explanation has been interpreted to require counterfactual-style explanations

Model-Specific Explanation Methods — Deep Dive

Grad-CAM (Gradient-weighted Class Activation Mapping)

Aspect Detail
Applies To Convolutional Neural Networks (CNNs) for image tasks
Core Idea Compute gradients of the target class score with respect to the feature maps of the last convolutional layer; use the gradient magnitude as weights to produce a heatmap
Output A coarse spatial heatmap highlighting the regions of the input image that were most important for the prediction
Variants Grad-CAM++ (improved multi-object localisation), Score-CAM (gradient-free), Layer-CAM (finer spatial resolution)
Strengths Fast; no retraining; easy to visualise; widely adopted in medical imaging
Limitations Coarse resolution (limited to the last conv layer); may miss fine-grained features; class-specific only

Integrated Gradients

Aspect Detail
Introduced Sundararajan et al. (2017)
Core Idea Accumulate gradients along a straight-line path from a baseline input (e.g., black image, zero vector) to the actual input
Mathematical Property Satisfies sensitivity (if a feature changes the prediction, it gets non-zero attribution) and implementation invariance (same function → same attributions)
Strengths Theoretically grounded; works on any differentiable model; no retraining
Limitations Baseline choice affects results; path integration requires many steps (100+); can be noisy for high-dimensional inputs

Mechanistic Interpretability (Circuit Analysis)

Aspect Detail
What It Is A research approach aiming to reverse-engineer the computational mechanisms (circuits) inside neural networks — understanding not just what features matter but how the model processes information internally
Key Techniques Activation patching, causal tracing, sparse autoencoders for feature discovery, circuit identification
Notable Work Anthropic's "Towards Monosemanticity" (2023), "Scaling Monosemanticity" (2024); Neel Nanda's TransformerLens; Chris Olah's "Zoom In"
Goal Move from "what features are important" to "how does the model compute its answer" — true understanding of model internals
Maturity Research-stage; most work on small Transformers; scaling to production LLMs is an active frontier
Significance If successful, mechanistic interpretability could provide definitive answers about model safety, bias, and behaviour — far beyond attribution methods

Concept-Based Explanations

Aspect Detail
What It Is Explanations at the level of human-meaningful concepts (e.g., "stripes", "wings", "loop shape") rather than raw pixel or feature values
TCAV Testing with Concept Activation Vectors (Kim et al., 2018) — tests how sensitive a model's predictions are to the presence of a human-defined concept
How It Works Train a linear classifier to separate activations corresponding to a concept from random activations; use the classifier's direction as a "concept vector"
Strengths Human-meaningful; bridges the gap between model internals and human understanding
Limitations Requires labelled concept datasets; concept definitions can be ambiguous

Explainability for Large Language Models & Foundation Models

Unique Challenges

Challenge Description
Scale LLMs have billions of parameters; traditional attribution methods are computationally prohibitive
Generative Output LLMs generate sequences, not single predictions — explaining why each token was generated is complex
Emergent Behaviour Capabilities emerge at scale (in-context learning, reasoning) that are not present in smaller models
Multimodal Inputs Foundation models increasingly handle text, images, and audio — explanation must span modalities
Prompt Sensitivity Small changes in prompts can dramatically change outputs — explanations must account for prompt influence
Black-Box API Access Many LLMs are available only via API — no access to weights, gradients, or activations

Current Approaches for LLM Explainability

Approach Description
Chain-of-Thought (CoT) Prompting the model to "explain its reasoning" step-by-step — generates a reasoning trace before the answer
Self-Consistency Sampling multiple CoT reasoning paths and checking agreement — higher consistency suggests more reliable reasoning
Attention Visualisation Visualising attention patterns across heads and layers; useful for understanding token dependencies
Probing Training simple classifiers on intermediate representations to discover what information is encoded at each layer
Mechanistic Interpretability Reverse-engineering circuits and features inside Transformer models (Anthropic, EleutherAI, DeepMind)
Logit Lens / Tuned Lens Projecting intermediate hidden states to the vocabulary to trace how the model's "opinion" evolves layer by layer
Sparse Autoencoders (SAEs) Decomposing neuron activations into interpretable features using sparse dictionaries — Anthropic's monosemanticity research
Retrieval Attribution For RAG systems: showing which retrieved documents influenced the response
Faithfulness Evaluation Testing whether CoT explanations actually reflect the model's computation — or are post-hoc rationalisations

Critical Caveat: CoT explanations may not be faithful. Recent research (Turpin et al., 2023; Lanham et al., 2023) shows that LLMs' self-explanations often do not accurately reflect the true factors driving their predictions. Chain-of-thought is a useful tool but should not be treated as ground-truth explanation.


Overview

Detailed reference content for overview.

Definition & Core Concept

Explainable AI (XAI) is the set of methods, techniques, and design principles that enable humans to understand why an AI system made a particular decision, how it arrived at that decision, and what factors influenced it. XAI bridges the gap between the predictive power of complex AI models and the human need for transparency, trust, and accountability.

The fundamental tension in modern AI is the accuracy-interpretability trade-off: the most accurate models (deep neural networks, large ensembles, LLMs) are often the least interpretable, while the most interpretable models (linear regression, decision trees) are often less powerful. XAI exists to resolve this tension — either by designing inherently interpretable models or by building post-hoc explanation tools around opaque ones.

XAI is not a standalone AI type — like privacy-preserving AI, it is a cross-cutting discipline that applies to any AI system. You can explain a predictive model, a generative model, a recommender system, or an autonomous agent. The techniques differ, but the goal is the same: make AI understandable to humans.

Dimension Detail
Core Capability Makes AI decisions understandable, interpretable, and transparent to humans
How It Works Inherently interpretable models, post-hoc explanation methods (SHAP, LIME, attention, gradients), counterfactual analysis, concept-based explanations
What It Produces Feature importance scores, attribution maps, counterfactual explanations, concept-level reasoning, natural language explanations
Key Differentiator Does not replace AI models — augments them with human-understandable explanations and accountability

Explainable AI vs. Other AI Types

AI Type What It Does Example
Explainable AI (XAI) Makes AI decisions understandable to humans SHAP explaining a loan denial
Agentic AI Pursues goals autonomously with tools, memory, and planning Research agent, coding agent
Analytical AI Extracts insights from data Anomaly detector, clustering
Autonomous AI (Non-Agentic) Operates independently within fixed boundaries without human input Autopilot, auto-scaling, algorithmic trading
Bayesian / Probabilistic AI Reasons under uncertainty using probability distributions Clinical trial analysis, A/B testing, risk modelling
Cognitive / Neuro-Symbolic AI Combines neural learning with symbolic reasoning LLM + knowledge graph, physics-informed neural net
Conversational AI Manages multi-turn dialogue between humans and machines Customer service chatbot, voice assistant
Evolutionary / Genetic AI Optimises solutions through population-based search inspired by natural selection Neural architecture search, logistics scheduling
Generative AI Creates new content from learned patterns LLM, image generator
Multimodal Perception AI Fuses vision, language, audio, and other modalities GPT-4o processing image + text, AV sensor fusion
Optimisation / Operations Research AI Finds optimal solutions to constrained mathematical problems Vehicle routing, supply chain planning, scheduling
Physical / Embodied AI Acts in the physical world through sensors and actuators Autonomous vehicle, robot arm, drone
Predictive / Discriminative AI Classifies or forecasts from data Fraud detection model
Privacy-Preserving AI Trains and runs AI without exposing raw data Federated learning, differential privacy
Reactive AI Responds to current input with no memory or learning Thermostat, ABS braking system
Recommendation / Retrieval AI Surfaces relevant items from large catalogues based on user signals Netflix suggestions, Google Search, Spotify playlists
Reinforcement Learning AI Learns optimal behaviour from reward signals via trial and error AlphaGo, robotic locomotion, RLHF
Scientific / Simulation AI Solves scientific problems and models physical systems AlphaFold, climate simulation, molecular dynamics
Symbolic / Rule-Based AI Reasons over explicit rules and knowledge to derive conclusions Medical expert system, legal reasoning engine

Key Distinction: XAI Is a Lens, Not a Model. XAI is not a type of AI model — it is a set of techniques applied to AI models. Any model can be made more explainable; the question is how much interpretability is needed and what explanations are appropriate.

Key Distinction: Interpretability vs. Explainability. Interpretability means a model is inherently understandable by design (a decision tree, a linear model). Explainability means post-hoc techniques are applied to make an opaque model understandable. XAI encompasses both.

Key Distinction from Debugging. XAI overlaps with model debugging but extends beyond it. Debugging asks "why is the model wrong?" XAI asks "why did the model make this decision?" — for both correct and incorrect predictions.