Bayesian / Probabilistic AI — Interactive Architecture Chart (2026)

The Bayesian Inference Pipeline

Bayesian inference follows a principled cycle: encode prior beliefs, observe data, compute the posterior, predict, decide, and update. Click each step to learn more.

Click a step

Select any step in the inference pipeline above to see what happens at that stage.

How Bayesian AI Works — The Inference Pipeline

+------------------------------------------------------------------------+
| BAYESIAN INFERENCE PIPELINE |
| |
| 1. PRIOR 2. LIKELIHOOD 3. POSTERIOR |
| ---------------- ---------------- ---------------- |
| Encode existing Define how the data Combine prior and |
| knowledge as a is generated given likelihood via Bayes' |
| probability the model parameters theorem to get updated |
| distribution beliefs |
| |
| 4. PREDICTION 5. DECISION 6. UPDATE |
| ---------------- ---------------- ---------------- |
| Integrate over Make optimal decisions Incorporate new data |
| posterior to that account for and update posterior |
| produce predictive uncertainty continuously |
| distribution |
+------------------------------------------------------------------------+

Bayes' Theorem — The Foundation

$$P(\theta | D) = \frac{P(D | \theta) \cdot P(\theta)}{P(D)}$$

Component	Name	Meaning
$P(\theta \| D)$	Posterior	Updated belief about parameters after observing data
$P(D \| \theta)$	Likelihood	Probability of the observed data given specific parameter values
$P(\theta)$	Prior	Initial belief about parameters before seeing data
$P(D)$	Evidence	Normalising constant; total probability of the data under all possible parameters

The Bayesian Workflow

Step	What Happens
Specify the Generative Model	Define how the data is generated: likelihood function + prior distributions over parameters
Observe Data	Collect observed data to condition on
Compute the Posterior	Use MCMC, variational inference, or analytical solutions to approximate the posterior
Posterior Predictive Check	Generate predictions from the posterior and compare to observed data to validate the model
Predict	For new inputs, integrate over the posterior to produce predictive distributions with uncertainty
Decide	Make decisions that account for uncertainty — risk-averse or risk-neutral depending on context
Update	As new data arrives, the posterior becomes the new prior; repeat the cycle

Key Parameters & Settings

Parameter	What It Controls
Prior Distribution	Encodes existing knowledge; informative priors help with small data; vague priors let data speak
Likelihood Function	Specifies the data-generating process (normal, Poisson, binomial, etc.)
MCMC Chains / Samples	Number of posterior samples; more = better approximation but slower
Warm-up / Burn-in	Initial MCMC samples discarded before the chain converges to the stationary distribution
Variational Family	Choice of approximate posterior distributions for variational inference
Credible Interval Width	Width of the posterior interval (e.g., 95% credible interval)

Did You Know?

Bayesian optimisation found better hyperparameters than grid search 10x faster in most ML competitions.

Gaussian processes provide not just predictions but calibrated uncertainty estimates for every data point.

Probabilistic programming languages like Stan and Pyro can express virtually any statistical model as code.

Knowledge Check

Test your understanding — select the best answer for each question.

Q1. What does Bayes' theorem compute?

Q2. What is a Gaussian Process (GP)?

Q3. What does MCMC stand for?

The Bayesian AI Stack — 8 Layers

Click any layer to expand its details. The stack is ordered from problem formulation (bottom) to decision-making (top).

The Bayesian AI Stack — 8 Layers

Layer	Name	Role	Key Technologies
8	Decision Layer	Convert posterior predictions into optimal decisions under uncertainty	Decision theory, utility functions, risk analysis
7	Prediction & Uncertainty	Generate predictive distributions with calibrated uncertainty estimates	Posterior predictive, credible intervals, HPD intervals
6	Model Checking	Validate model fit via posterior predictive checks and diagnostics	LOO-CV, WAIC, R-hat, ESS, divergence checks
5	Inference Engine	Compute or approximate the posterior distribution	MCMC (NUTS), VI (ADVI), Laplace, EP
4	Model Specification	Define the generative model: likelihood + priors + structure	Probabilistic programming languages
3	Prior Knowledge	Encode domain expertise as informative priors	Expert elicitation, prior predictive checks
2	Data Layer	Collect and preprocess observed data	Pandas, Arrow, database queries
1	Problem Formulation	Define the scientific or business question as a probabilistic model	Domain expertise, causal DAGs

Bayesian AI Sub-Types

The major families of Bayesian inference methods, each with distinct trade-offs between exactness, scalability, and flexibility.

Sub-Types by Inference Method

Sub-Type	Core Mechanism	Typical Applications
Exact Bayesian Inference	Analytically compute the posterior (conjugate priors)	Simple models, Bayesian linear regression, Beta-Binomial
MCMC-Based Inference	Sample from the posterior via Markov chains	Complex hierarchical models, clinical trials, spatial models
Variational Inference	Optimise an approximate posterior	Large-scale models, Bayesian deep learning, topic models
Gaussian Process Inference	Non-parametric function-space inference	Bayesian optimisation, geostatistics, surrogate modelling
Bayesian Network Inference	Propagate beliefs through graphical models	Diagnosis, risk analysis, causal reasoning
Bayesian Deep Learning	Place distributions over neural network weights	Uncertainty-aware predictions, active learning, safety-critical AI
Bayesian Nonparametric Inference	Infinite-dimensional models that grow with data	Flexible clustering, topic discovery, density estimation
Approximate Bayesian Computation (ABC)	Likelihood-free inference via simulation	Population genetics, ecology, models with intractable likelihoods

Core Architectures

The foundational probabilistic model architectures that underpin Bayesian AI systems across domains.

Core Architectures & Techniques

Bayesian Networks (Probabilistic Graphical Models)

Aspect	Detail
Core Mechanism	Directed acyclic graph (DAG) where nodes are random variables and edges represent conditional dependencies
Key Advantage	Encodes causal and conditional relationships explicitly; supports exact or approximate inference
Used For	Medical diagnosis, risk analysis, fault detection, gene regulatory networks, causal discovery
Key Limitation	Structure learning is NP-hard; exact inference is computationally expensive for large networks
Key Implementations	pgmpy, bnlearn (R), Pomegranate, BayesiaLab

Gaussian Processes (GPs)

Aspect	Detail
Core Mechanism	A non-parametric Bayesian model that defines a distribution over functions; predictions include uncertainty bands
Key Advantage	Naturally produces calibrated uncertainty estimates; works well with small datasets
Used For	Bayesian optimisation, geostatistics (kriging), surrogate modelling, time-series forecasting
Key Limitation	Cubic computational complexity O(n^3); does not scale to large datasets without approximation
Scalable Variants	Sparse GPs (inducing points), GPyTorch (GPU-accelerated), Variational GPs

Markov Chain Monte Carlo (MCMC)

Aspect	Detail
Core Mechanism	Draw samples from the posterior distribution by constructing a Markov chain that converges to it
Key Algorithms	Metropolis-Hastings, Gibbs Sampling, Hamiltonian Monte Carlo (HMC), NUTS (No-U-Turn Sampler)
Why It Matters	The gold-standard for posterior inference when analytical solutions are intractable
Key Advantage	Asymptotically exact; applicable to arbitrarily complex models
Key Limitation	Computationally expensive; convergence can be slow for high-dimensional models
Modern Standard	NUTS (Hoffman & Gelman, 2014); used in Stan, PyMC, NumPyro — self-tuning HMC

Variational Inference (VI)

Aspect	Detail
Core Mechanism	Approximate the posterior by optimising a simpler distribution to be as close as possible (minimise KL divergence)
Key Algorithms	Mean-field VI, Stochastic VI (SVI), Automatic Differentiation VI (ADVI), Normalising Flows
Key Advantage	Much faster than MCMC; scales to large datasets and complex models; compatible with deep learning
Key Limitation	Approximate — may underestimate posterior uncertainty, especially with simple variational families
Used In	Bayesian deep learning, topic modelling (LDA), large-scale probabilistic models

Bayesian Linear & Generalised Linear Models

Aspect	Detail
Core Mechanism	Place priors on regression coefficients; derive posterior distributions over coefficients
Key Advantage	Fully interpretable; uncertainty over each coefficient; natural regularisation via priors
Used For	Clinical trials, epidemiology, A/B testing, risk scoring, causal effect estimation
Key Libraries	Stan, PyMC, brms (R), rstanarm (R)

Hidden Markov Models (HMMs)

Aspect	Detail
Core Mechanism	Model sequential data as transitions between hidden states with observable emissions
Key Algorithms	Forward-backward algorithm, Viterbi algorithm, Baum-Welch (EM for HMMs)
Used For	Speech recognition (legacy), gene finding, financial regime detection, activity recognition

Bayesian Nonparametrics

Aspect	Detail
Core Mechanism	Models whose complexity grows with data; the number of parameters is not fixed a priori
Key Models	Dirichlet Process Mixture Models, Gaussian Process regression, Indian Buffet Process, Beta Process
Key Advantage	Automatically infers the number of clusters, factors, or components from the data
Used For	Clustering with unknown number of clusters, topic modelling, density estimation

Tools & Platforms

Industry-leading probabilistic programming frameworks and Bayesian optimisation tools powering modern Bayesian AI.

Tool	Language	Description

Leading Platforms, Frameworks & Tools

Probabilistic Programming Frameworks

Framework	Language	Highlights
Stan	C++ / R / Python	Industry gold-standard MCMC; NUTS sampler; CmdStan, RStan, PyStan
PyMC v5	Python	Intuitive API; MCMC + VI; ArviZ integration; strong community
NumPyro	Python/JAX	Fast GPU-accelerated inference; composable with JAX
Pyro	Python/PyTorch	Deep probabilistic programming; SVI; flexible
TensorFlow Probability	Python/TF	Probabilistic layers; MCMC + VI; Keras integration
Turing.jl	Julia	Composable; fast MCMC; differential equation support
brms	R	High-level R formula syntax for Bayesian models via Stan backend

Bayesian Optimisation Libraries

Library	Language	Highlights
BoTorch (Meta)	Python/PyTorch	Modular BO framework; built on GPyTorch; supports multi-objective BO
Ax (Meta)	Python	Adaptive experimentation platform; uses BoTorch; A/B testing + BO
Optuna	Python	Hyperparameter optimisation; supports Bayesian (TPE) + pruning
Hyperopt	Python	Tree-structured Parzen Estimator (TPE) based BO
Google Vizier	Python	Google's BO service; now open-sourced
Dragonfly	Python	Scalable BO with multi-fidelity and multi-objective support
Trieste	Python/TF	BO library from Secondmind; supports batch and constrained optimisation

Gaussian Process Libraries

Library	Language	Highlights
GPyTorch	Python/PyTorch	Scalable GPs; GPU-accelerated; variational and exact GP inference
GPflow	Python/TF	GP library on TensorFlow; variational GPs; multi-output GPs
scikit-learn GPs	Python	Built-in GP regression and classification; good baseline
GPy	Python	Comprehensive GP library from Sheffield; multi-output, sparse GPs

Inference Diagnostics & Visualisation

Tool	Highlights
ArviZ	Bayesian analysis and visualisation; trace plots, posterior plots, LOO-CV, R-hat, ESS
bayesplot (R)	Visualisation for Bayesian workflows; posterior, prior/posterior comparison, MCMC diagnostics
ShinyStan	Interactive browser-based MCMC diagnostics for Stan models

Use Cases by Domain

Bayesian AI powers uncertainty-aware decision-making across healthcare, finance, technology, science, manufacturing, and analytics.

Industry Use Cases

Healthcare & Clinical Research

Use Case	Description	Key Examples
Clinical Trial Design	Bayesian adaptive trial designs that update dosing and allocation as data arrives	Berry Consultants, FACTS, Cytel
Bayesian Meta-Analysis	Combine results across multiple studies with proper uncertainty	Cochrane Reviews, brms, Stan
Disease Progression Modelling	Probabilistic model of disease trajectory to guide treatment decisions	Alzheimer's progression models, oncology DPMs
Drug Dose-Response	Model dose-response curves with uncertainty to set safe effective doses	CRM (Continual Reassessment Method)
Epidemiological Modelling	Bayesian SIR/SEIR models for pandemic tracking and forecasting	COVID-19 models (Imperial College, IHME)

Finance & Insurance

Use Case	Description	Key Examples
Risk Modelling	Bayesian estimation of tail risks with full posterior uncertainty	VaR modelling, operational risk, credit risk
A/B Testing & Experimentation	Bayesian A/B testing with early stopping and continuous monitoring	VWO, Optimizely (Bayesian mode), Eppo
Actuarial Modelling	Bayesian credibility theory for insurance pricing with limited claims data	Bayesian claim frequency/severity models
Portfolio Optimisation	Bayesian estimates of expected returns with uncertainty propagation	Black-Litterman model, Bayesian mean-variance
Fraud Detection Under Uncertainty	Flag suspicious activity with calibrated confidence scores	Bayesian anomaly detection

Technology & Product

Use Case	Description	Key Examples
Bayesian A/B Testing	Measure experiment outcomes with credible intervals; early stopping rules	Google, Netflix, Microsoft experimentation
Hyperparameter Optimisation	Find optimal ML hyperparameters with minimal compute budget	Optuna, BoTorch, Google Vizier, SigOpt
Anomaly Detection	Bayesian changepoint detection and outlier scoring with uncertainty	Bayesian Online Changepoint Detection (BOCPD)
Demand Forecasting	Probabilistic demand forecasts with full prediction intervals	Prophet (Meta), PyMC time series
Natural Language Processing	Topic modelling (LDA), Bayesian sentiment analysis, uncertainty in NLP	Latent Dirichlet Allocation, Bayesian NLP

Science & Engineering

Use Case	Description	Key Examples
Geostatistics & Spatial Modelling	Gaussian Process kriging for spatial prediction with uncertainty	Mining, environmental monitoring, agriculture
Astrophysics	Bayesian inference on cosmological parameters from observational data	Planck CMB analysis, gravitational wave PE
Particle Physics	Statistical inference on particle properties with systematic uncertainty	CERN ATLAS/CMS analyses
Robotics & Control	Bayesian state estimation and model-based control with uncertainty	Kalman filters, Bayesian SLAM
Materials Discovery	Bayesian optimisation for materials with expensive experimental evaluations	Self-driving laboratories, BO for materials

Benchmarks & Diagnostics

Key model comparison metrics and inference diagnostic targets used to evaluate Bayesian models.

Model Comparison Metrics

Inference Diagnostics Targets

Evaluation & Performance Metrics

Model Comparison Metrics

Metric	What It Measures	When to Use
LOO-CV (Leave-One-Out Cross-Validation)	Predictive accuracy estimated by leaving out one observation at a time	Gold-standard for Bayesian model comparison
WAIC (Widely Applicable Information Criterion)	Bayesian generalisation of AIC; estimates predictive accuracy	General model comparison
Bayes Factor	Ratio of evidence for two competing models	Hypothesis testing
Log Predictive Density	Log probability of held-out data under the posterior predictive distribution	Predictive quality evaluation
DIC (Deviance Information Criterion)	Bayesian model comparison metric based on effective number of parameters	Hierarchical models (use LOO-CV if possible)

Inference Diagnostics

Diagnostic	What It Checks	Target Value
R-hat (Gelman-Rubin)	Convergence of MCMC chains; compares between-chain and within-chain variance	< 1.01 (chains have converged)
Effective Sample Size (ESS)	Number of effectively independent posterior samples	> 400 per parameter (minimum)
Divergences	HMC/NUTS integration failures indicating problematic posterior geometry	0 divergences (any divergence is a warning)
Trace Plots	Visual inspection of MCMC chain mixing and stationarity	Chains should look like "hairy caterpillars"
Energy Plot	Diagnoses HMC sampling efficiency	Marginal and transition energy should match

Calibration Metrics

Metric	What It Measures
Calibration Plot	Whether predicted probabilities match observed frequencies
Expected Calibration Error (ECE)	Average gap between predicted confidence and actual accuracy
Coverage of Credible Intervals	Whether the stated 95% interval actually contains 95% of observed values
Continuous Ranked Probability Score (CRPS)	Measures quality of the full predictive distribution against observed outcomes
Prediction Interval Width	Sharpness of uncertainty estimates; narrower is better (if well-calibrated)

Market Data

Current market size and projected growth for Bayesian and probabilistic AI segments.

Market Segments (2026, $B)

Bayesian ML Market Growth 2024–2030 (CAGR 23%)

Market & Adoption Data

Market Size & Growth

Metric	Value	Source / Notes
Global Bayesian / Probabilistic ML Market (2024)	~$1.8 billion	Includes probabilistic modelling, Bayesian analytics, and BO platforms
Projected Market Size (2030)	~$6.2 billion	CAGR ~23%; driven by clinical trials, uncertainty-aware AI, and BO
A/B Testing / Experimentation Market (2024)	~$1.2 billion	Bayesian A/B testing growing as share of overall experimentation market
Key Verticals	Pharma, finance, tech (experimentation), defence	Strongest where uncertainty quantification is legally or operationally required
Stan User Base	100,000+ users globally	Most widely adopted probabilistic programming language

Key Vendors & Competitive Landscape

Segment	Leaders	Challengers
Probabilistic Programming	Stan, PyMC, NumPyro	Pyro, Turing.jl, TensorFlow Probability
Bayesian Optimisation	BoTorch/Ax (Meta), Optuna, Google Vizier	SigOpt (Intel), Dragonfly, Hyperopt
Bayesian A/B Testing	Optimizely, VWO, Eppo, Statsig	LaunchDarkly, GrowthBook
Gaussian Processes	GPyTorch, GPflow, GPy	scikit-learn GPs, BoTorch
Clinical Trial Software	Berry Consultants (FACTS), Cytel, Medidata	Adaptive trials via Stan, brms

Risks & Challenges

Key risks and limitations facing Bayesian AI adoption and deployment.

Risks, Limitations & Boundaries

Technical Limitations

Limitation	Description
Computational Cost	MCMC for complex models can be extremely slow; posterior inference scales poorly with data size and dimensions
Prior Sensitivity	Results can be heavily influenced by prior choices, especially with small datasets
Model Misspecification	If the generative model is wrong, Bayesian updating may concentrate on the wrong region of parameter space
Approximate Inference Errors	Variational inference can underestimate uncertainty; MCMC may not converge
Scalability	Exact Bayesian inference is intractable for large neural networks and massive datasets
Expertise Required	Designing good Bayesian models requires strong statistical expertise
Conjugacy Constraints	Analytical solutions exist only for conjugate prior-likelihood pairs; general models require numerical methods
Identifiability Issues	Complex models may have multiple parameter configurations that explain the data equally well

Interpretation & Communication Risks

Risk	Description
Prior Subjectivity	Critics argue Bayesian priors introduce subjectivity; defenders argue priors encode real knowledge
Credible vs. Confidence Intervals	Bayesian credible intervals have different interpretations than frequentist confidence intervals
Overconfident Posteriors	Poor model specification or overly tight priors can produce posteriors that are unjustifiably narrow
Communication Complexity	Probabilistic results are harder to communicate to non-technical stakeholders than point estimates

Related AI System Types

Explore how this system type connects to others in the AI landscape:

Predictive / Discriminative AI Analytical AI Explainable AI (XAI) Federated / Privacy-Preserving AI Scientific / Simulation AI

Glossary

Key terms in Bayesian and probabilistic AI.

Key Terminology Glossary

Term	Definition
Bayes' Theorem	The rule for updating beliefs: posterior is proportional to likelihood times prior
Bayesian Network	A directed acyclic graph encoding conditional dependencies between random variables
Bayesian Optimisation (BO)	Using a probabilistic surrogate model to efficiently optimise expensive black-box functions
Calibration	The degree to which predicted probabilities match observed frequencies
Conjugate Prior	A prior distribution that, when combined with a specific likelihood, yields a posterior in the same distribution family
Credible Interval	A Bayesian interval containing the parameter with a stated probability (e.g., 95%); differs from frequentist confidence interval
Dirichlet Process	A Bayesian nonparametric prior over distributions; enables mixture models with an unknown number of components
Evidence (Marginal Likelihood)	The probability of the observed data under the model, integrating over all parameter values
Gaussian Process (GP)	A non-parametric Bayesian model that defines a distribution over functions; predictions include uncertainty
Hamiltonian Monte Carlo (HMC)	An MCMC algorithm that uses gradient information to efficiently explore the posterior
Hidden Markov Model (HMM)	A probabilistic model for sequential data with hidden states and observable emissions
Hierarchical Model	A multi-level Bayesian model where parameters at one level are drawn from distributions at a higher level
Inference	The process of computing the posterior distribution from the prior and observed data
Likelihood	The probability of the observed data given specific parameter values; P(D given theta)
Markov Chain Monte Carlo (MCMC)	A family of algorithms that draw samples from the posterior distribution by constructing convergent Markov chains
NUTS (No-U-Turn Sampler)	A self-tuning variant of HMC; the default sampler in Stan and PyMC
Posterior Distribution	The updated probability distribution over parameters after observing data; the result of Bayesian inference
Posterior Predictive Distribution	The distribution of future observations predicted by the model after conditioning on observed data
Prior Distribution	The probability distribution representing beliefs about parameters before observing data
Probabilistic Programming	A programming paradigm where users specify probabilistic models as code and inference is automated
R-hat (Gelman-Rubin Statistic)	A convergence diagnostic comparing between-chain and within-chain variance; values near 1.0 indicate convergence
Surrogate Model	An approximate, cheap-to-evaluate model used in place of an expensive objective function (common in BO)
Variational Inference (VI)	An approximate inference method that optimises a simpler distribution to approximate the posterior

Visual Infographics

Animation infographics for Bayesian / Probabilistic AI — overview and full technology stack.

Conceptual Overview

Bayesian / Probabilistic AI — Overview Infographic

Animation overview · Bayesian / Probabilistic AI · 2026

Full Technology Stack

Bayesian / Probabilistic AI — Tech Stack Infographic

Animation tech stack · Hardware → Compute → Data → Frameworks → Orchestration → Serving → Application · 2026

Regulation

Detailed reference content for regulation.

Regulation & Governance

Bayesian methods are generally favoured by regulators because they provide transparent uncertainty quantification — a requirement in many high-stakes domains.

Domain	Regulatory Relevance
Pharmaceuticals (FDA/EMA)	FDA has issued guidance supporting Bayesian adaptive trial designs; EMA recognises Bayesian methods
Medical Devices	Bayesian methods accepted for premarket submissions (510(k), PMA) with proper justification
Financial Services	Bayesian models used for stress testing, risk modelling; regulators require posterior uncertainty reporting
EU AI Act	Uncertainty quantification is aligned with transparency and robustness requirements for high-risk AI
Insurance / Actuarial	Bayesian credibility theory is a standard actuarial tool; regulators understand and accept it
Clinical Research	ICH E9(R1) addendum supports Bayesian estimands and frameworks for clinical trials

Deep Dives

Detailed reference content for deep dives.

Bayesian Deep Learning — Deep Dive

Overview

Bayesian Deep Learning (BDL) integrates Bayesian uncertainty quantification into deep neural networks — enabling neural networks to express not just predictions, but how confident they are in those predictions.

Approach	Method	Key Properties
MC Dropout	Use dropout at inference time; multiple forward passes approximate the posterior	Simple to implement; approximate uncertainty
Bayes By Backprop	Learn a distribution over weights using variational inference	Principled; more memory-intensive
Deep Ensembles	Train multiple independent networks; treat ensemble variance as uncertainty	Simple, effective, well-calibrated
Laplace Approximation	Fit a Gaussian to the posterior at the MAP estimate using the Hessian	Post-hoc; works with pre-trained models
Stochastic Weight Averaging (SWAG)	Approximate posterior using trajectory of SGD weights	Low cost; good uncertainty estimates
Neural Network Gaussian Processes (NNGP)	Interpret infinite-width neural networks as GPs	Theoretical connection; exact in the limit

Why It Matters

Application	How BDL Helps
Autonomous Driving	Detect when the perception model is uncertain and hand control back to the driver
Medical Diagnosis	Flag cases where the model is unsure, routing them to human expert review
Active Learning	Identify the most informative data points to label next, reducing labelling cost
Out-of-Distribution Detection	Detect inputs that are unlike the training data and should not be trusted
Calibrated Predictions	Ensure that predicted probabilities match real-world frequencies

Probabilistic Programming — Deep Dive

What Is Probabilistic Programming

Probabilistic programming languages (PPLs) allow users to specify Bayesian models as programs and automatically perform inference — without needing to manually derive or implement inference algorithms.

Leading Probabilistic Programming Languages

Language / Framework	Backend	Key Features
Stan	C++	Gold-standard for MCMC (NUTS); R, Python, Julia interfaces; industry and academia
PyMC (v5)	PyTensor	Python-native; MCMC + VI; intuitive API; strong community
NumPyro	JAX	JAX-accelerated; fast MCMC and VI; GPU/TPU support; composable with JAX ecosystem
Pyro	PyTorch	Deep probabilistic programming; stochastic variational inference; GPU-accelerated
TensorFlow Probability	TensorFlow	Probabilistic layers for Keras; MCMC, VI, bijectors, distributions
Edward2 / Oryx	TensorFlow / JAX	Lightweight probabilistic programming; research-oriented
Turing.jl	Julia	Julia-native; composable; fast MCMC; excellent differential equation integration
Bean Machine (Meta)	PyTorch	Graph-based PPL for Bayesian modelling; automated inference
brms (R)	Stan	High-level R formula interface for Bayesian GLMs, GAMs, and multilevel models
BUGS / JAGS	Custom	Pioneering PPLs; still used in epidemiology and medical research

Probabilistic Programming Workflow

Step	What Happens
Define Priors	Specify prior distributions for each parameter based on domain knowledge
Define Likelihood	Specify the data-generating process (e.g., y ~ Normal(mu, sigma))
Condition on Data	Provide observed data to the model
Run Inference	The PPL automatically runs MCMC, VI, or other algorithms to compute the posterior
Diagnose	Check convergence diagnostics: R-hat, ESS, divergences, trace plots
Posterior Predictive Check	Simulate data from the posterior and compare to actual observations
Report & Decide	Summarise posterior, compute credible intervals, and make decisions

Bayesian Optimisation — Deep Dive

Overview

Bayesian Optimisation (BO) uses a probabilistic surrogate model (typically a Gaussian Process) to efficiently find the optimum of expensive-to-evaluate black-box functions.

+------------------------------------------------------------------------+
| BAYESIAN OPTIMISATION LOOP |
| |
| 1. FIT SURROGATE 2. ACQUISITION 3. EVALUATE |
| ---------------- ---------------- ---------------- |
| Fit a GP (or other Compute acquisition Evaluate the true |
| surrogate) to all function to find objective at the |
| observations so far most promising point selected point |
| |
| 4. UPDATE 5. REPEAT |
| ---------------- ---------------- |
| Add new observation Until budget |
| to dataset exhausted |
+------------------------------------------------------------------------+

Key Components

Component	Role	Common Choices
Surrogate Model	Approximate the objective function with uncertainty	Gaussian Process, Random Forest, Bayesian NN
Acquisition Function	Decide where to evaluate next, balancing exploration and exploitation	Expected Improvement (EI), UCB, Knowledge Gradient
Observation Model	Handle noise in objective function evaluations	Exact observations, noisy GP, heteroscedastic noise

Applications of Bayesian Optimisation

Application	Description	Key Tools
Hyperparameter Tuning	Find optimal ML hyperparameters with minimal training runs	Optuna, BoTorch, Hyperopt, Ax (Meta)
Drug Discovery	Optimise molecular properties with expensive wet-lab experiments	BoTorch, GPyOpt, ChemOS
Materials Design	Find materials with optimal properties (conductivity, strength)	BoTorch, Dragonfly, Emukit
A/B Testing / Experiment Design	Allocate experimental budget efficiently across variants	Ax (Meta), Adaptive Experimentation
Robotics & Control	Tune controller parameters with minimal real-world trials	BoTorch, Trieste, Safety-aware BO
Chip Design	Optimise VLSI placement and routing parameters	Google Vizier, BoTorch

Overview

Detailed reference content for overview.

Definition & Core Concept

Bayesian and Probabilistic AI is the branch of artificial intelligence grounded in probability theory and Bayesian inference — representing knowledge as probability distributions, updating beliefs systematically as new data arrives (via Bayes' theorem), and producing predictions that carry explicit measures of uncertainty. Every output of a Bayesian system comes with a confidence interval, a credible interval, or a full posterior distribution — not just a point estimate.

This paradigm is fundamentally different from standard machine learning, which typically produces a single best prediction. A standard classifier says "this email is spam with 87% probability." A Bayesian system says "my belief that this email is spam is described by a distribution centered at 87%, and my uncertainty about that estimate is +/- 4%, given the data I have seen." This distinction is critical in high-stakes domains where knowing what the model does not know is as important as its predictions.

Bayesian AI has deep historical roots — Bayes' theorem was published in 1763 — and has been the dominant paradigm in fields such as clinical trial design, epidemiology, A/B testing, geostatistics, and signal processing for decades. Its resurgence in modern AI is driven by the need for uncertainty quantification in safety-critical applications, the development of probabilistic programming languages that make Bayesian modelling accessible, and the integration of Bayesian principles into deep learning.

Dimension	Detail
Core Capability	Quantifies uncertainty — produces predictions as probability distributions, not point estimates
How It Works	Specify a prior; observe data; compute the posterior via Bayes' theorem; make predictions by integrating over the posterior
What It Produces	Posterior distributions, credible intervals, predictive distributions, optimal decisions under uncertainty
Key Differentiator	Every prediction carries an explicit measure of confidence; the model knows what it does not know

Bayesian AI vs. Other AI Types

AI Type	What It Does	Example
Bayesian / Probabilistic AI	Reasons under uncertainty using probability distributions	Clinical trial analysis, A/B testing, risk modelling, uncertainty quantification
Agentic AI	Pursues goals autonomously using tools, memory, and planning	Research agent, coding agent, autonomous workflow
Analytical AI	Extracts insights and explanations from existing data	Dashboard, root-cause analysis, anomaly detection
Autonomous AI (Non-Agentic)	Operates independently within fixed boundaries without human input	Autopilot, auto-scaling, algorithmic trading
Cognitive / Neuro-Symbolic AI	Combines neural learning with symbolic reasoning	LLM + knowledge graph, physics-informed neural net
Conversational AI	Manages multi-turn dialogue between humans and machines	Customer service chatbot, voice assistant
Evolutionary / Genetic AI	Optimises solutions through population-based search inspired by natural selection	Neural architecture search, logistics scheduling
Explainable AI (XAI)	Makes AI decisions understandable to humans	SHAP explanations, LIME, Grad-CAM
Generative AI	Creates new original content from learned distributions	Write an essay, generate an image, synthesise a video
Multimodal Perception AI	Fuses vision, language, audio, and other modalities	GPT-4o processing image + text, AV sensor fusion
Optimisation / Operations Research AI	Finds optimal solutions to constrained mathematical problems	Vehicle routing, supply chain planning, scheduling
Physical / Embodied AI	Acts in the physical world through sensors and actuators	Autonomous vehicle, robot arm, drone
Predictive / Discriminative AI	Classifies or forecasts from historical patterns	Fraud score, churn probability, demand forecast
Privacy-Preserving AI	Trains and runs AI without exposing raw data	Federated hospital models, differential privacy
Reactive AI	Responds to current input with no learning or memory	Chess engine, rule-based spam filter
Recommendation / Retrieval AI	Surfaces relevant items from large catalogues based on user signals	Netflix suggestions, Google Search, Spotify playlists
Reinforcement Learning AI	Learns optimal behaviour from reward signals via trial and error	AlphaGo, robotic locomotion, RLHF
Scientific / Simulation AI	Solves scientific problems and models physical systems	AlphaFold, climate simulation, molecular dynamics
Symbolic / Rule-Based AI	Reasons over explicit rules and knowledge to derive conclusions	Medical expert system, legal reasoning engine

Key Distinction from Predictive AI: Predictive AI typically produces a single best estimate (a point prediction). Bayesian AI produces a full probability distribution over outcomes — capturing not just the best guess, but exactly how uncertain that guess is.

Key Distinction from Explainable AI: XAI explains which features drove a decision. Bayesian AI quantifies how confident the model is in that decision and where more data would reduce uncertainty.