Federated & Privacy-Preserving AI — Interactive Architecture Chart (2026)

The Federated Learning Pipeline

Privacy-preserving AI follows a seven-step circular pipeline where local training and global aggregation repeat without raw data ever leaving its source. Click each step to learn more.

DATA OWNERS

Train locally

→

LOCAL UPDATES

Model gradients

→

PRIVACY ENHANCE

DP noise / encryption

→

AGGREGATION

Server combines

→

GLOBAL UPDATE

Improved model

→

DISTRIBUTE

Push to clients

→

REPEAT

Until converged

Click a step above

Explore how data owners collaboratively train a shared model without ever exchanging raw data — privacy is enforced at every step of the pipeline.

How Privacy-Preserving AI Works — The Distributed Training Pipeline

┌──────────────────────────────────────────────────────────────────────────┐
│ PRIVACY-PRESERVING AI — FEDERATED LEARNING EXAMPLE │
│ │
│ DATA OWNER A DATA OWNER B DATA OWNER C │
│ ────────────── ────────────── ────────────── │
│ Trains model Trains model Trains model │
│ locally on locally on locally on │
│ own data own data own data │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ Local model Local model Local model │
│ update update update │
│ │ │ │ │
│ └───────────────────┼───────────────────┘ │
│ ▼ │
│ ┌──────────────────────┐ │
│ │ AGGREGATION SERVER │ │
│ │ Combines updates │ │
│ │ (e.g., FedAvg) │ │
│ │ NO RAW DATA SEEN │ │
│ └──────────┬───────────┘ │
│ │ │
│ ▼ │
│ Updated global model │
│ sent back to all │
│ participants │
│ │
│ ──── RAW DATA NEVER LEAVES ITS OWNER — ONLY MODEL UPDATES SHARED ──── │
└──────────────────────────────────────────────────────────────────────────┘

The Privacy-Preserving Process

Step	What Happens
Data Remains Local	Each participant's data stays on their own infrastructure — never transferred, copied, or viewed by others
Local Training	Each participant trains a model (or computes gradients) on their local data using the current global model
Update Extraction	Model updates (gradients or weight deltas) are extracted — not the data itself
Privacy Enhancement	Optional: noise is added to updates (differential privacy), updates are encrypted (secure aggregation), or computation is performed in a TEE
Secure Aggregation	Updates from all participants are aggregated (averaged) without exposing any individual's update
Global Model Update	The aggregated update is applied to the global model
Distribution	The updated global model is sent back to all participants for the next round
Iteration	The process repeats until the model converges

Key Privacy-Preserving AI Parameters

Parameter	What It Controls
Privacy Budget (ε)	Differential privacy epsilon — lower = stronger privacy, higher noise, less accuracy
Number of Participants	How many data owners contribute — more participants improve both privacy and model quality
Communication Rounds	Number of federated training rounds — more rounds improve convergence
Local Epochs	Number of training passes each participant performs locally before sending updates
Aggregation Strategy	How updates are combined — FedAvg, FedProx, FedMA, weighted average
Clipping Norm	Bound on the maximum size of individual gradient updates (for differential privacy)
Encryption Scheme	Type of encryption used for secure computation — HE, MPC, TEE
Noise Mechanism	Type of noise added for differential privacy — Gaussian, Laplace

Did You Know?

Google's Gboard keyboard was one of the first mass-deployed federated learning systems, used by billions.

Differential privacy adds calibrated noise to data — Apple uses it to collect usage statistics from 1.5 billion devices.

Federated learning can reduce data transfer by 100x compared to centralised training approaches.

Knowledge Check

Test your understanding — select the best answer for each question.

Q1. What stays on-device in federated learning?

Q2. What does differential privacy add to data?

Q3. What is secure multi-party computation (MPC)?

The 8-Layer Privacy-Preserving AI Stack

Federated and privacy-preserving AI is organised into eight architectural layers from data ingestion to governance. Click any layer to expand details.

The Privacy-Preserving AI Stack — 8 Layers

Layer	What It Covers
1. Data Layer	Local data stores, data governance policies, data classification, consent management
2. Privacy Mechanisms	Differential privacy, secure aggregation, HE, MPC, TEE, k-anonymity, synthetic data
3. Federated Training	Local training, gradient computation, communication protocols, aggregation algorithms
4. Aggregation & Coordination	Central server, peer-to-peer coordination, asynchronous aggregation, model merging
5. Model Layer	Global model, personalised local models, model compression for edge deployment
6. Inference & Serving	Private inference (encrypted), on-device inference, confidential computing
7. Audit & Verification	Privacy budget tracking, membership inference testing, differential privacy verification
8. Governance & Compliance	Data processing agreements, regulatory compliance (GDPR, HIPAA), consent management, audit logs

Sub-Types of Privacy-Preserving AI

Nine principal sub-types spanning federated learning topologies, mathematical privacy guarantees, cryptographic computation, hardware isolation, and synthetic data generation.

Sub-Types by Privacy Mechanism

By Data Distribution Architecture

Type	Description	Example
Cross-Device Federated	Training across millions of edge devices (phones, IoT); tiny data per device	Google Keyboard (Gboard) next-word prediction
Cross-Silo Federated	Training across a small number of organisations (5–100); large data per silo	Hospitals training a shared cancer model
Vertical Federated	Participants have different features for the same individuals (e.g., bank + telecom)	Joint credit scoring across institutions
Horizontal Federated	Participants have the same features but different individuals	Same hospital network across regions

By Privacy Guarantee

Type	Privacy Guarantee	Performance Impact
Differential Privacy	Mathematical bound on individual information leakage (ε-DP)	Accuracy loss from noise injection
Homomorphic Encryption	Cryptographic — data never decrypted during computation	1,000x–1,000,000x slowdown
Secure Multi-Party Computation	Cryptographic — no party sees another's input	High communication overhead
Trusted Execution Environments	Hardware isolation — data decrypted only inside enclave	Near-native performance
Synthetic Data	Statistical — no individual's actual data in the output	Quality loss; privacy depends on generation method
Federated Learning (alone)	Architectural — raw data doesn't leave the device	Gradients can still leak information

Core Architectures

Five foundational architectural families that underpin all privacy-preserving AI systems.

Core Architectures & Techniques

Federated Learning (FL)

Aspect	Detail
Core Idea	Train a shared model across multiple decentralised data holders; each trains locally; only model updates are shared
Introduced	Google (2016) — McMahan et al., "Communication-Efficient Learning of Deep Networks from Decentralised Data"
FedAvg	Foundation algorithm: each client trains locally, sends updates; server averages them
Cross-Device FL	Training across millions of edge devices (smartphones, IoT) — small data per device, high participant count
Cross-Silo FL	Training across organisations (hospitals, banks) — larger data per participant, smaller participant count
Strengths	Data never leaves the device; scales to millions of participants; enables cross-organisational collaboration
Weaknesses	Communication overhead; non-IID data challenges; model poisoning risk; gradients can leak information

Differential Privacy (DP)

Aspect	Detail
Core Idea	Add carefully calibrated mathematical noise to data, queries, or model updates — guaranteeing that the output is statistically nearly identical whether any single individual's data is included or not
Formal Guarantee	For any individual: the probability of any output changes by at most a factor of e^ε whether their data is included or not
Privacy Budget (ε)	Lower ε = stronger privacy (more noise); ε < 1 is strong; ε > 10 is weak
Strengths	Mathematical privacy guarantee; composable — total privacy loss across multiple analyses is provably bounded
Weaknesses	Accuracy-privacy trade-off — stronger privacy requires more noise, degrading utility
Used By	Apple (emoji suggestions), Google (Chrome, Android), US Census Bureau (2020 Census)

Homomorphic Encryption (HE)

Aspect	Detail
Core Idea	Encrypt data in a way that allows computation on the ciphertext — the result, when decrypted, is the same as if the computation had been performed on the plaintext
Types	Partially HE (supports one operation), Somewhat HE (limited operations), Fully HE (FHE — arbitrary computation)
Strengths	Data is never decrypted during computation; strongest privacy guarantee
Weaknesses	Extremely computationally expensive — 1,000x–1,000,000x slower than plaintext computation; improving but still impractical for large models
Key Schemes	BFV, BGV, CKKS (approximate FHE for ML), TFHE
Used In	Privacy-preserving inference, encrypted database queries, secure voting

Secure Multi-Party Computation (MPC)

Aspect	Detail
Core Idea	Multiple parties jointly compute a function over their inputs while keeping each party's input private from all others
How It Works	Data is split into secret shares distributed across parties; computation proceeds on shares; no single party can reconstruct any other party's data
Protocols	Garbled circuits, secret sharing (Shamir, additive), oblivious transfer
Strengths	Strong privacy guarantees; more practical than FHE for many applications
Weaknesses	High communication overhead; requires synchronous online parties; latency
Used In	Private set intersection (ad measurement), private auctions, cross-organisational analytics

Trusted Execution Environments (TEE)

Aspect	Detail
Core Idea	Hardware-based isolation — computation occurs in a secure enclave that even the machine's owner cannot inspect
How It Works	Data is encrypted in transit and at rest; decrypted only inside the TEE; results are encrypted before leaving
Hardware	Intel SGX, AMD SEV, ARM TrustZone, NVIDIA Confidential Computing
Strengths	Near-native performance; practical for real workloads; hardware root of trust
Weaknesses	Side-channel attacks have been demonstrated (Spectre, Foreshadow); trust in hardware vendor
Used In	Confidential cloud computing (Azure Confidential, GCP Confidential), key management, private inference

Synthetic Data Generation

Aspect	Detail
Core Idea	Generate artificial data that preserves the statistical properties of real data without containing any actual individual's information
Techniques	GANs, VAEs, diffusion models trained on real data; DP-SGD to ensure the generator doesn't memorise individuals
Strengths	Shareable, usable for development and testing; no privacy restrictions on synthetic data
Weaknesses	Quality depends on the generator; may miss tail distributions; privacy guarantee depends on generation method
Used In	Healthcare data sharing, financial ML development, software testing

Tools & Frameworks

The definitive toolkit for building federated and privacy-preserving AI systems — from FL frameworks to homomorphic encryption libraries.

Tool	Creator	Key Capabilities
Flower (flwr)	Flower Labs	Framework-agnostic FL; supports PyTorch, TensorFlow, JAX
PySyft	OpenMined	FL + DP + MPC toolkit; remote data science
TensorFlow Federated	Google	FL with strong differential privacy integration
NVIDIA FLARE	NVIDIA	Enterprise FL; healthcare-focused; provisioning & security
Opacus	Meta	DP-SGD for PyTorch; per-sample gradient clipping
Google DP Library	Google	Core DP mechanisms; production-grade C++ with Python bindings
OpenDP	Harvard / Microsoft	Modular DP framework; composable privacy guarantees
Microsoft SEAL	Microsoft	Homomorphic encryption library; BFV & CKKS schemes
OpenFHE	Open-source	Comprehensive FHE library; BGV, BFV, CKKS, TFHE
CrypTen	Meta	MPC for PyTorch via secret sharing; ML-friendly API
Azure Confidential Computing	Microsoft	Intel SGX & AMD SEV; hardware-isolated enclaves
AWS Nitro Enclaves	AWS	Isolated compute environments; attestation-based trust

Leading Platforms, Frameworks & Tools

Federated Learning Frameworks

Platform	Provider	Deployment	Highlights
Flower (flwr)	Flower Labs (open-source)	Open-Source (any OS; Python 3.8+; CPU or GPU; distributed across any cloud / on-prem nodes)	Framework-agnostic FL; supports PyTorch, TensorFlow, JAX; growing ecosystem
PySyft	OpenMined (open-source)	Open-Source (any OS; Python 3.9+; CPU or GPU)	Privacy-preserving ML; FL + DP + MPC; Python-native
TensorFlow Federated (TFF)	Google (open-source)	Open-Source (any OS; Python 3.9+; CPU or GPU or TPU; CUDA 11.8+ for GPU)	FL with TensorFlow; strong DP integration; simulation and deployment
NVIDIA FLARE	NVIDIA (open-source)	Open-Source (Linux; Python 3.8+; NVIDIA GPU recommended; Docker/K8s for enterprise)	Enterprise FL; supports custom aggregation; healthcare focus
IBM FL	IBM (open-source)	Open-Source (Linux; Python 3.8+; CPU or GPU; IBM Cloud Pak compatible)	Enterprise federated learning platform; multi-framework support
Substra	Owkin (open-source)	Open-Source (Linux; K8s cluster; Docker; Python 3.9+)	FL for healthcare; focus on traceability and governance
FedML	FedML (open-source)	Open-Source / Cloud (any OS; Python 3.8+; CPU or GPU; FedML Cloud on AWS)	Cross-device and cross-silo FL; spans edge, cloud, and on-premise
Federated AI Technology Enabler (FATE)	WeBank (open-source)	Open-Source (Linux; Docker/K8s; Python 3.8+; multi-node cluster)	Industrial FL platform; supports horizontal, vertical, and transfer FL

Differential Privacy Libraries

Library	Provider	Deployment	Highlights
Opacus	Meta (open-source / PyTorch)	Open-Source (any OS; Python 3.8+; PyTorch; CPU or NVIDIA GPU)	DP-SGD for PyTorch model training; the go-to for DP deep learning
TensorFlow Privacy	Google (open-source)	Open-Source (any OS; Python 3.8+; TensorFlow; CPU or GPU or TPU)	DP-SGD for TensorFlow models
Google DP Library	Google (open-source, C++)	Open-Source (any OS; C++ / Java / Go; CPU-only)	Core DP mechanisms (Laplace, Gaussian, count, mean, quantile); production-grade
OpenDP	Harvard/Microsoft (open-source)	Open-Source (any OS; Rust core + Python bindings; CPU-only)	Modular DP framework; composable privacy guarantees
Tumult Analytics	Tumult Labs	Cloud (Tumult SaaS on AWS; integrates with Spark on Databricks / EMR)	Commercial DP analytics platform; used by the US Census Bureau
PipelineDP	OpenMined / Google (open-source)	Open-Source (any OS; Python 3.8+; runs on Apache Spark or Beam; CPU-only)	DP for Apache Spark and Beam pipelines

Secure Computation Platforms

Platform	Provider	Deployment	Highlights
Microsoft SEAL	Microsoft (open-source)	Open-Source (any OS; C++ / .NET; CPU-only; high-memory recommended)	Homomorphic encryption library; BFV and CKKS schemes
OpenFHE	DARPA-funded (open-source)	Open-Source (Linux/macOS/Windows; C++17; CPU-only; high-memory recommended)	Comprehensive FHE library; BGV, BFV, CKKS, TFHE
Concrete ML	Zama (open-source)	Open-Source (Linux/macOS; Python 3.8+; CPU-only; Rust compiler required)	ML inference on FHE; compile scikit-learn/PyTorch models to FHE
MP-SPDZ	Open-source	Open-Source (Linux/macOS; C++; Python 3.8+; multi-node network for MPC)	Multi-party computation framework; supports many protocols
ABY / ABY3	Academic (open-source)	Open-Source (Linux; C++; multi-node network required)	MPC framework supporting arithmetic, Boolean, and Yao sharing
CrypTen	Meta (open-source)	Open-Source (Linux; Python 3.8+; PyTorch; CPU or GPU; multi-process)	MPC for PyTorch; private ML through secret sharing

Confidential Computing Platforms

Platform	Provider	Deployment	Highlights
Azure Confidential Computing	Microsoft	Cloud (Azure — DCsv3/DCdsv3 VMs with Intel SGX; ACC VMs with AMD SEV-SNP)	TEE-based (Intel SGX, AMD SEV); confidential VMs and containers
GCP Confidential Computing	Google	Cloud (GCP — N2D Confidential VMs with AMD SEV; Confidential GKE nodes)	Confidential VMs (AMD SEV); Confidential GKE
AWS Nitro Enclaves	Amazon	Cloud (AWS — EC2 instances with Nitro Enclaves enabled)	Isolated compute environments on EC2; no persistent storage or network
NVIDIA Confidential Computing	NVIDIA	Cloud (AWS P5 / Azure ND H100; NVIDIA H100 GPU with confidential compute mode)	GPU TEE; H100 confidential computing for AI inference

Use Cases

Where federated and privacy-preserving AI delivers real-world value — from healthcare to financial crime prevention.

Industry Use Cases

Healthcare & Life Sciences

Use Case	Description	Key Examples
Multi-Hospital Model Training	FL across hospitals to build shared models without sharing patient data	HealthChain, Owkin, NVIDIA Clara FL
Rare Disease Research	FL enables combining data across institutions for diseases too rare for any single institution	Federated tumour segmentation (FeTS)
Drug Discovery	Secure computation for cross-pharma compound screening without IP exposure	MELLODDY consortium (10 pharma companies)
Genomics	Privacy-preserving genome-wide association studies	Secure GWAS via MPC
EHR Analytics	Differentially private analytics across electronic health record systems	HIPAA-compliant DP analytics

Financial Services

Use Case	Description	Key Examples
Cross-Bank Fraud Detection	FL or MPC to train shared fraud models without sharing customer transaction data	Singapore's FATE project, FinTech collaborations
AML Collaboration	PSI and MPC for anti-money laundering across institutions	SWIFT collaborative analytics
Credit Scoring	Vertical FL combining bank and telecom data for improved credit scoring	WeBank FATE deployments
Risk Analytics	DP for aggregate risk reporting without exposing individual positions	Regulatory reporting

Technology

Use Case	Description	Key Examples
Mobile Keyboard Prediction	Cross-device FL for next-word prediction without sending typed text to servers	Google Gboard, Apple QuickType
Voice Assistant Training	FL for speech recognition improvement without centralising voice recordings	Apple Siri, Google Assistant
Browser Telemetry	Local DP for aggregate usage statistics without tracking individual users	Google RAPPOR (Chrome)
Ad Measurement	PSI and aggregated reporting for ad conversion without cross-site tracking	Google Privacy Sandbox, Apple SKAdNetwork
Recommendation Systems	FL for personalised recommendations without centralising user history	Research prototypes; growing commercial interest

Government & Public Sector

Use Case	Description	Key Examples
Census & Statistical Agencies	DP for publishing accurate statistics while protecting individual respondents	US Census Bureau 2020 (DP), Eurostat
Cross-Agency Intelligence	MPC for joint analytics across intelligence agencies without sharing classified data	Research prototypes
Tax Administration	Privacy-preserving analytics for tax compliance without exposing individual returns	Government pilot programmes

Telecommunications

Use Case	Description	Key Examples
Network Optimisation	FL for optimising mobile network performance using data from user devices	Ericsson, Nokia research
Joint Fraud Detection	MPC for collaborative fraud detection across telecom operators	EU research consortia

Automotive

Use Case	Description	Key Examples
Federated Driving Model Training	FL across vehicle fleets to improve driving models without centralising driving data	BMW, Mercedes, Stellantis research
V2X Privacy	Privacy-preserving vehicle-to-everything communication	Connected vehicle research

Benchmarks & Evaluation

Quantitative measures of privacy strength and federated learning performance.

Privacy Metrics

FL Performance

Evaluation & Performance Metrics

Privacy Metrics

Metric	What It Measures
Privacy Budget (ε)	Total differential privacy epsilon consumed; lower = stronger privacy
δ (Delta)	Probability of privacy guarantee failure; should be negligible (e.g., 1/n²)
Membership Inference Success Rate	Can an attacker determine if a specific individual was in the training data? Lower = better
Attribute Inference Risk	Can an attacker infer sensitive attributes from the model? Lower = better
Model Inversion Risk	Can an attacker reconstruct training data from the model? Lower = better
k-Anonymity Level	Minimum number of individuals sharing each combination of quasi-identifiers
l-Diversity	Diversity of sensitive values within each equivalence class

Utility Metrics

Metric	What It Measures
Model Accuracy vs. Centralised Baseline	How much accuracy is lost compared to a model trained on centralised data
Privacy-Utility Tradeoff Curve	Accuracy as a function of ε — characterises how much privacy costs in utility
Convergence Rounds	Number of federated rounds needed to reach target accuracy
Per-Client Accuracy	Individual client's local model performance — measures fairness across participants
Statistical Utility	For DP queries: error between privatised and true statistics

Efficiency Metrics

Metric	What It Measures
Communication Cost	Total data transferred between clients and server during training
Computation Overhead	Additional compute cost vs. non-private training (HE can be 1,000x+)
Training Time	Wall-clock time including communication rounds and local computation
Encryption Throughput	Operations per second for HE or MPC workloads
Latency	End-to-end time for a single private inference

Market Data

Market sizing and growth projections for federated learning and privacy-enhancing technologies.

Market Segments (2024)

Federated Learning Market Growth (2024–2028)

Market & Adoption Data

Market Context

Metric	Value	Source / Notes
Federated Learning Market (2024)	~$195 million	MarketsandMarkets; projected ~$420M by 2028
Privacy-Enhancing Technologies (PET) Market (2024)	~$2.2 billion	Includes DP, HE, MPC, TEE, FL, synthetic data
Confidential Computing Market (2024)	~$4.8 billion	Azure, GCP, AWS, NVIDIA; growing rapidly
Organisations Deploying FL (2024)	~12% of large enterprises (pilot or production)	Gartner; concentrated in healthcare and finance
Organisations Using DP (2024)	~18% of large tech companies; <5% of non-tech enterprises	Apple, Google, Meta lead; enterprise adoption nascent

Adoption Trends

Trend	Description
Regulatory Acceleration	New privacy regulations (GDPR enforcement, US state laws, PIPL) are driving urgent adoption
Healthcare Leading	Healthcare is the top vertical for FL deployment — driven by strict privacy requirements and distributed data
Confidential Computing Mainstream	Azure, GCP, and AWS all offer confidential computing; GPU TEE emerging
FHE Approaching Practicality	Dramatic improvements in FHE performance (Zama Concrete, Intel HE-Transformer) — still 100x+ overhead
Synthetic Data Growing	Synthetic data emerging as a simpler alternative for development and testing use cases
Privacy as Competitive Advantage	Apple and others marketing privacy as a product differentiator
Cross-Industry Data Collaborations	Consortia forming for federated fraud detection, drug discovery, smart manufacturing

Risks & Challenges

Key risks and open challenges facing federated and privacy-preserving AI deployments.

Risks, Limitations & Boundaries

Fundamental Limitations

Limitation	Description
Privacy-Utility Tradeoff	Stronger privacy guarantees require more noise or slower computation, reducing model utility
Computational Overhead	HE and MPC introduce massive compute costs — orders of magnitude slower than plaintext
Communication Cost	FL requires multiple communication rounds; bandwidth-intensive for large models
Non-IID Data	Federated data is often non-identically distributed across clients, degrading model quality
Complexity	Implementing privacy-preserving AI correctly is technically extremely difficult
Incomplete Protection	FL alone does not prevent all privacy attacks — gradient inversion, membership inference remain risks
Scalability	MPC communication grows with the number of parties; FHE is computationally expensive for large models
Maturity	Many techniques are still research-grade; production deployments are limited in scope

Privacy Attack Vectors

Attack	Description	Mitigation
Gradient Inversion	Reconstruct training data from shared gradient updates	Secure aggregation + differential privacy
Membership Inference	Determine whether a specific individual's data was used in training	DP training (DP-SGD); regularisation
Model Inversion	Reconstruct representative training examples from the trained model	DP; output perturbation; access controls
Attribute Inference	Infer sensitive attributes from model predictions	DP; limit model output precision
Model Poisoning	Malicious participants send corrupted updates to degrade or backdoor the model	Byzantine-robust aggregation; anomaly detection
Data Poisoning	Malicious participants include bad data to influence model behaviour	Data quality checks; robust training
Side-Channel Attacks	Extract information from TEE hardware through timing, power, or electromagnetic signals	Hardware hardening; constant-time algorithms

When Privacy-Preserving AI Is the Right Choice

Criterion	Why Privacy-Preserving AI Excels
Legal Requirements	When regulations (GDPR, HIPAA, CCPA) prohibit data centralisation
Cross-Organisation Collaboration	When multiple organisations want to train jointly without sharing proprietary data
Sensitive Data	When data is inherently sensitive (medical, financial, personal)
Competitive Concerns	When participants are competitors who cannot share raw data
Jurisdictional Constraints	When data cannot cross borders due to data sovereignty laws
Trust Deficit	When data owners do not trust each other or a central party

Related AI System Types

Explore how this system type connects to others in the AI landscape:

Predictive / Discriminative AI Bayesian / Probabilistic AI Explainable AI (XAI) Analytical AI Generative AI

Glossary

Essential terminology for federated and privacy-preserving AI.

Key Terminology Glossary

Term	Definition
Central DP	Differential privacy applied by a trusted central server that has access to raw data and adds noise before publishing results
Confidential Computing	Hardware-based isolation that protects data during processing using Trusted Execution Environments
Cross-Device Federated Learning	FL where participants are millions of edge devices (smartphones, IoT) with small local datasets
Cross-Silo Federated Learning	FL where participants are organisations (hospitals, banks) with large local datasets
Data Minimisation	GDPR principle: collect and process only the data strictly necessary for the stated purpose
Differential Privacy (DP)	A mathematical framework guaranteeing that the output of a computation is nearly the same whether any individual's data is included or not
DP-SGD	Differentially Private Stochastic Gradient Descent — training ML models with per-sample gradient clipping and noise addition
Epsilon (ε)	The privacy loss parameter in differential privacy; lower ε = stronger privacy
FedAvg	Federated Averaging — the foundational FL algorithm: local training + weighted averaging of model updates
Federated Learning (FL)	Training a shared model across decentralised data holders; data never leaves the device
Fully Homomorphic Encryption (FHE)	Encryption that allows arbitrary computation on ciphertext; the holy grail of encrypted computation
Gradient Inversion Attack	An attack that reconstructs training data from shared model gradients
Horizontal Federated Learning	FL where participants have the same features but different data records (same columns, different rows)
Homomorphic Encryption (HE)	Encryption that allows computation on ciphertext — the decrypted result equals the computation on plaintext
k-Anonymity	Privacy property ensuring each record is indistinguishable from at least k-1 other records
Local DP	Differential privacy applied on each individual's device before any data is sent — strongest trust model
Membership Inference Attack	An attack that determines whether a specific individual's data was used to train a model
Model Inversion Attack	An attack that reconstructs representative training examples from model outputs
Model Poisoning	A malicious participant sends corrupted model updates to degrade or backdoor the federated model
Non-IID Data	Non-Independent and Identically Distributed — data distributions differ across federated clients
Oblivious Transfer	A cryptographic protocol where a receiver obtains one of several items from a sender without the sender knowing which
Privacy Budget	The total allowable privacy loss (ε) across all computations; once exhausted, no further private computations are permitted
Privacy-Enhancing Technologies (PET)	Umbrella term for technologies that protect privacy: DP, HE, MPC, FL, TEE, synthetic data
Private Set Intersection (PSI)	A protocol where two parties learn which elements appear in both their sets without revealing anything else
Secure Aggregation	A protocol ensuring the server can only see the aggregate of client updates, not individual updates
Secure Multi-Party Computation (MPC)	Protocols allowing multiple parties to jointly compute a function while keeping each party's input private
Secret Sharing	Splitting data into shares distributed across parties; no single party can reconstruct the data alone
Sensitivity	In DP: the maximum change in a query's output when one individual's data is added or removed
Synthetic Data	Artificially generated data that preserves statistical properties of real data without containing actual individuals
TEE (Trusted Execution Environment)	A secure area within a processor that ensures code and data are protected in use
Vertical Federated Learning	FL where participants have different features for the same individuals (same rows, different columns)

Visual Infographics

Animation infographics for Federated / Privacy-Preserving AI — overview and full technology stack.

Conceptual Overview

Federated / Privacy-Preserving AI — Overview Infographic

Animation overview · Federated / Privacy-Preserving AI · 2026

Full Technology Stack

Federated / Privacy-Preserving AI — Tech Stack Infographic

Animation tech stack · Hardware → Compute → Data → Frameworks → Orchestration → Serving → Application · 2026

Regulation

Detailed reference content for regulation.

Regulation & Governance

Regulatory Drivers

Regulation	How It Drives Privacy-Preserving AI
GDPR (EU)	Data minimisation, purpose limitation, and cross-border transfer restrictions — FL and DP help comply
HIPAA (US)	Protected health information cannot be shared without consent — FL enables multi-hospital research
CCPA/CPRA (California)	Consumer privacy rights — DP enables analytics while respecting opt-out
Data Sovereignty Laws	Many countries require data to stay within borders — FL avoids cross-border data transfer
EU Data Act (2024)	Regulates data sharing and access — privacy-preserving techniques enable compliant data collaboration
US Executive Order on AI (2023)	Promotes privacy-preserving AI research and development
PIPL (China)	Personal Information Protection Law — strict data localisation and consent requirements

Governance Best Practices

Practice	Description
Privacy Budget Management	Track total differential privacy epsilon consumed; set a maximum budget; halt when exhausted
Participant Agreements	Formal data processing agreements defining each party's responsibilities, data types, and usage restrictions
Audit Trail	Record all training rounds, participants, aggregation events, and privacy budget consumption
Independent Verification	Third-party audit of privacy guarantees and implementation correctness
Attack Testing	Regular testing of membership inference, gradient inversion, and other privacy attacks
Transparency Reports	Publish what data types, what ε values, and what techniques are used
Consent Management	Ensure all data subjects have consented to the specific federated use

Deep Dives

Detailed reference content for deep dives.

Federated Learning — Deep Dive

FedAvg Algorithm

Step	Description
1. Initialise	Server initialises global model w₀
2. Select Clients	Each round, server selects a subset of K clients
3. Distribute	Server sends current global model wₜ to selected clients
4. Local Training	Each client k trains on local data for E epochs; produces local model wₜᵏ
5. Upload	Each client sends model update (wₜᵏ - wₜ) to server
6. Aggregate	Server computes weighted average: wₜ₊₁ = Σ (nₖ/n) · wₜᵏ
7. Repeat	Return to step 2; continue for T rounds

Federated Learning Challenges

Challenge	Description	Solution
Non-IID Data	Each client's data is not identically distributed — different label distributions, data quality, volume	FedProx, SCAFFOLD, personalisation layers
Communication Cost	Sending model updates over limited bandwidth (especially mobile)	Gradient compression, quantisation, sparse updates
System Heterogeneity	Clients have different compute capabilities, network speeds, and availability	Asynchronous FL, client selection strategies
Privacy Leakage	Model gradients can leak information about training data (gradient inversion attacks)	Secure aggregation + differential privacy
Model Poisoning	Malicious clients send corrupted updates to degrade the global model	Byzantine-robust aggregation (Krum, Trimmed Mean)
Free-Riding	Clients benefit from the global model without contributing genuine updates	Contribution measurement, incentive mechanisms
Fairness	Global model may perform well on majority data but poorly on minority clients	Fair aggregation, personalisation

Federated Learning Variants

Variant	Description
FedAvg	Baseline: average model weights across clients
FedProx	Adds a proximal term to handle non-IID data and system heterogeneity
FedMA	Matches and merges neurons across client models for heterogeneous architectures
SCAFFOLD	Uses control variates to reduce client drift in non-IID settings
Per-FedAvg	Personalised federated averaging — learn a global model that can be quickly fine-tuned to each client
FedBN	Keeps batch normalisation layers local to each client for domain adaptation
Split Learning	Model is split between client and server; each trains their portion; reduces compute on client

Differential Privacy — Deep Dive

Formal Definition

A randomised mechanism M satisfies (ε, δ)-differential privacy if for all datasets D₁ and D₂ differing on a single individual and for all possible outputs S:

P[M(D₁) ∈ S] ≤ e^ε · P[M(D₂) ∈ S] + δ

In plain language: whether or not any single individual's data is included in the dataset, the output of the computation is nearly the same.

Key DP Mechanisms

Mechanism	How It Works	Used For
Laplace Mechanism	Adds Laplace noise calibrated to the query sensitivity	Numeric queries (count, sum, average)
Gaussian Mechanism	Adds Gaussian noise; used with (ε, δ)-DP	Numeric queries; more flexible than Laplace
Exponential Mechanism	Selects an output from a set with probability proportional to a quality score	Non-numeric outputs (selection queries)
DP-SGD	Clips per-sample gradients and adds Gaussian noise during model training	Training ML models with differential privacy
RAPPOR	Randomised Aggregatable Privacy-Preserving Ordinal Response — local DP for frequency estimation	Google Chrome telemetry
Private Selection / Sparse Vector Technique	Privately answers threshold queries using minimal privacy budget	Multiple queries with limited budget

DP in Practice

Deployment	ε Value	Description
Apple (iOS)	ε = 1–8 per day (estimated)	Local DP for emoji, QuickType, health, safari suggestions
Google (RAPPOR)	ε = 1–9 per report	Local DP for Chrome usage statistics
US Census Bureau (2020)	ε ≈ 19.6 (total budget)	DP applied to 2020 Census redistricting data
Meta	ε varies by use case	DP for analytics and ad measurement
LinkedIn	ε varies	DP for talent insights and analytics

Secure Computation Techniques

Homomorphic Encryption In-Depth

┌──────────────────────────────────────────────────────────────────────────┐
│ HOMOMORPHIC ENCRYPTION PIPELINE │
│ │
│ DATA OWNER CLOUD / SERVER DATA OWNER │
│ ────────────── ────────────── ────────────── │
│ Encrypt data Compute on Decrypt │
│ with HE key encrypted data result with │
│ (addition, mult) private key │
│ │
│ Plaintext: x Ciphertext: E(x) Plaintext: f(x) │
│ → Encrypt(x) ────────► Eval(f, E(x)) ────────► Decrypt(E(f(x))) │
│ = E(f(x)) = f(x) │
│ │
│ ──── SERVER NEVER SEES PLAINTEXT DATA OR RESULT ────── │
└──────────────────────────────────────────────────────────────────────────┘

Secure Multi-Party Computation In-Depth

Protocol Family	How It Works	Strengths
Secret Sharing	Each party holds a "share" of the data; computation proceeds on shares; result is reconstructed by combining shares	Efficient for arithmetic circuits; low per-operation cost
Garbled Circuits	One party "garbles" a Boolean circuit; the other party evaluates it using oblivious transfer	General-purpose; any function can be computed
Oblivious Transfer	A protocol where a sender has multiple messages; the receiver gets exactly one without the sender knowing which	Foundational building block for garbled circuits

Private Set Intersection (PSI)

Aspect	Detail
What It Is	Two parties each have a set of items; they learn only which items appear in both sets (intersection) — nothing else
Why It Matters	Enables ad measurement (did users who saw ads also buy?), contact matching, and fraud detection across organisations without sharing customer lists
Used By	Google (ads conversion), Meta (ad measurement), Apple (Private Relay), financial crime detection collaborations
Protocol	Typically based on Diffie-Hellman key exchange, oblivious PRF, or Bloom filter techniques

Overview

Detailed reference content for overview.

Definition & Core Concept

Federated and Privacy-Preserving AI encompasses a family of techniques that enable AI training, inference, and analysis without centralising or exposing raw data. Instead of moving data to a central server, these techniques either bring the computation to the data (federated learning), add mathematical privacy guarantees (differential privacy), or perform computation on encrypted data (secure computation).

This is not a single AI type in the way that generative or predictive AI is — it is a cross-cutting set of techniques and architectures that can be applied to almost any AI system to protect data privacy. Federated learning can train a generative model; differential privacy can protect a predictive model; and secure computation can enable analytical AI across organisations.

The motivation is clear: the most valuable AI would learn from the world's most sensitive data — medical records, financial transactions, personal communications, government intelligence. But centralising this data is often legally prohibited (GDPR, HIPAA), commercially impossible (competitors won't share data), or ethically unacceptable. Privacy-preserving AI resolves this tension by enabling the learning without the sharing.

Dimension	Detail
Core Capability	Protects — enables AI training and inference without exposing, centralising, or compromising the privacy of underlying data
How It Works	Federated learning, differential privacy, homomorphic encryption, secure multi-party computation, trusted execution environments
What It Produces	Privacy-preserving models, encrypted inferences, mathematically private statistics, cross-organisational insights
Key Differentiator	Data never leaves its source — models come to data, not data to models; privacy is mathematical, not just policy

Privacy-Preserving AI vs. Other AI Types

AI Type	What It Does	Example
Privacy-Preserving AI	Trains and runs AI without exposing raw data	Federated model training across hospitals
Agentic AI	Pursues goals autonomously with tools, memory, and planning	Research agent, coding agent
Analytical AI	Extracts insights and explanations from data	BI dashboards, anomaly detection
Autonomous AI (Non-Agentic)	Operates independently within fixed boundaries without human input	Autopilot, auto-scaling, algorithmic trading
Bayesian / Probabilistic AI	Reasons under uncertainty using probability distributions	Clinical trial analysis, A/B testing, risk modelling
Cognitive / Neuro-Symbolic AI	Combines neural learning with symbolic reasoning	LLM + knowledge graph, physics-informed neural net
Conversational AI	Manages multi-turn dialogue with users	Chatbot, voice assistant
Evolutionary / Genetic AI	Optimises solutions through population-based search inspired by natural selection	Neural architecture search, logistics scheduling
Explainable AI (XAI)	Makes AI decisions understandable to humans	SHAP explanations, LIME, Grad-CAM
Generative AI	Creates new content from learned patterns	Text generation, image synthesis
Multimodal Perception AI	Fuses vision, language, audio, and other modalities	GPT-4o processing image + text, AV sensor fusion
Optimisation / Operations Research AI	Finds optimal solutions to constrained mathematical problems	Vehicle routing, supply chain planning, scheduling
Physical / Embodied AI	Acts in the physical world through sensors and actuators	Autonomous vehicle, robot arm, drone
Predictive / Discriminative AI	Classifies or forecasts from historical data	Fraud detection, disease prediction
Reactive AI	Responds to current input with no memory or learning	Thermostat, ABS braking system
Recommendation / Retrieval AI	Surfaces relevant items from large catalogues based on user signals	Netflix suggestions, Google Search, Spotify playlists
Reinforcement Learning AI	Learns optimal behaviour from reward signals via trial and error	AlphaGo, robotic locomotion, RLHF
Scientific / Simulation AI	Solves scientific problems and models physical systems	AlphaFold, climate simulation, molecular dynamics
Symbolic / Rule-Based AI	Reasons over explicit rules and knowledge to derive conclusions	Medical expert system, legal reasoning engine

Key Distinction: Cross-Cutting Technique, Not Independent Type. Privacy-Preserving AI is not a standalone AI type — it is a set of techniques applied to other AI types. You can have a privacy-preserving predictive model, a federated generative model, or a differentially private analytical system.

Key Distinction from Standard Centralised AI: Standard AI centralises all training data on one server. Privacy-preserving AI keeps data distributed; only model updates, encrypted computations, or noisy aggregates are shared.

Key Distinction from Anonymisation: Traditional anonymisation de-identifies data before sharing. Privacy-preserving AI goes further — data is never shared at all, or computation occurs on encrypted data with mathematical guarantees.