A comprehensive interactive exploration of Generative AI — the generation pipeline, 8-layer stack, output modalities, foundation models, developer tools, benchmarks, market data, and more.
~55 min read · Interactive ReferenceGenerative AI follows a five-stage pipeline from raw data to novel content generation. Click any step to learn more.
Select any stage above to see details about that phase of the generative AI pipeline.
Generative AI systems follow a common high-level pipeline:
┌─────────────────────────────────────────────────────────────────┐
│ GENERATIVE AI PIPELINE │
│ │
│ 1. DATA 2. PRE-TRAINING 3. FINE-TUNING │
│ ───────── ───────────────── ───────────── │
│ Collect & Train on massive Specialise on │
│ clean vast unlabelled corpus domain data / │
│ datasets (self-supervised) RLHF alignment │
│ │
│ 4. INFERENCE 5. PROMPTING 6. OUTPUT │
│ ───────────── ───────────────── ────────── │
│ Deploy model User provides Model generates │
│ for generation context/instruction novel content │
└─────────────────────────────────────────────────────────────────┘
| Step | What Happens |
|---|---|
| Tokenisation | Input text is broken into tokens (sub-word units) the model can process |
| Encoding | Tokens are converted into numerical vector representations |
| Attention | The model weighs relationships between all tokens simultaneously (self-attention) |
| Forward Pass | Representations flow through transformer layers, building rich contextual understanding |
| Decoding | The model predicts the next token based on all prior context, one at a time |
| Sampling | Temperature, top-p, and top-k parameters control the randomness of generation |
| Output | Generated tokens are decoded back into human-readable content |
| Parameter | What It Controls |
|---|---|
| Temperature | Randomness of output — higher = more creative, lower = more deterministic |
| Top-p (Nucleus Sampling) | Considers only the smallest set of tokens whose cumulative probability exceeds p |
| Top-k | Limits token selection to the k most probable next tokens |
| Max Tokens | Maximum length of the generated output |
| System Prompt | Instructions that shape the model's persona, behaviour, and constraints |
| Context Window | Maximum number of tokens the model can process at once |
GPT-4 was trained on an estimated 13 trillion tokens — roughly 10 million books' worth of text.
Diffusion models generate images by reversing a noise-addition process over ~1,000 denoising steps.
The first transformer paper ('Attention Is All You Need', 2017) has been cited over 130,000 times.
Test your understanding — select the best answer for each question.
Q1. Which architecture is the foundation of most large language models (LLMs)?
Q2. What technique aligns LLM outputs with human preferences?
Q3. What do diffusion models learn to reverse during image generation?
Click any layer to expand its details. The stack is ordered from foundation (bottom) to application (top).
| Layer | What It Covers |
|---|---|
| 1. Foundation Models | The large pre-trained models at the core |
| 2. Training Infrastructure | GPU/TPU compute, distributed training, data pipelines |
| 3. Model APIs & Hosting | Access to models via API or managed endpoints |
| 4. Developer Frameworks | SDKs, orchestration libraries, prompt management |
| 5. Fine-Tuning & Alignment | Domain adaptation, RLHF, instruction tuning |
| 6. Retrieval & Memory | RAG systems, vector databases, context management |
| 7. Applications & Products | Consumer apps, enterprise software, vertical solutions |
| 8. Observability & Safety | Monitoring, guardrails, evaluation, red-teaming |
Generative AI produces content across an expanding set of modalities — from text to molecules.
| Capability | Description |
|---|---|
| Long-Form Writing | Essays, reports, articles, books, scripts |
| Conversational AI | Multi-turn dialogue, Q&A, customer support |
| Summarisation | Condense documents, meeting notes, research papers |
| Translation | Cross-language content transformation |
| Code Generation | Write, explain, debug, and refactor code |
| Reasoning | Multi-step problem solving, logic, mathematics |
| Information Extraction | Pull structured data from unstructured text |
| Classification | Categorise, label, and score text |
| Capability | Description | Key Tools |
|---|---|---|
| Text-to-Image | Generate images from text descriptions | DALL·E 3, Midjourney, Stable Diffusion, Imagen 3 |
| Image-to-Image | Transform an image using text or another image | ControlNet, InstructPix2Pix, Adobe Firefly |
| Inpainting | Fill or replace specific regions of an image | DALL·E 3, Stable Diffusion Inpainting |
| Outpainting | Extend image content beyond its borders | Adobe Generative Fill, DALL·E 3 |
| Style Transfer | Apply artistic styles to photographs | StyleGAN, Neural Style Transfer |
| Image Upscaling | Enhance resolution and detail | Topaz Gigapixel, Magnific AI |
| Background Removal | Automatically remove or replace backgrounds | Adobe Firefly, Remove.bg AI |
| Face / Portrait Generation | Generate realistic human faces | StyleGAN3, Midjourney |
| Product Image Generation | Create product images for e-commerce | Pebblely, Claid.ai |
| Capability | Description | Key Tools |
|---|---|---|
| Text-to-Video | Generate video clips from text prompts | Sora (OpenAI), Veo 3 (Google), Kling |
| Image-to-Video | Animate a static image into video | Runway Gen-3, Pika, Stable Video Diffusion |
| Video-to-Video | Transform or stylise existing video | Runway Gen-3 Transform, Topaz Video AI |
| Video Extension | Extend short clips forward or backward | Runway, Pika |
| Lip Sync | Sync AI-generated speech to video faces | HeyGen, Sync.so, D-ID |
| AI Avatar Video | Talking head videos from text scripts | Synthesia, HeyGen, Colossyan |
| Video Upscaling / Enhancement | Improve resolution and frame rate | Topaz Video AI, NVIDIA RTX VSR |
| Capability | Description | Key Tools |
|---|---|---|
| Text-to-Speech (TTS) | Convert text to natural-sounding voice | ElevenLabs, OpenAI TTS, Amazon Polly |
| Voice Cloning | Clone a specific person's voice from samples | ElevenLabs, Resemble AI, Play.ht |
| Speech-to-Speech | Transform voice style or accent | ElevenLabs Voice Changer, RVC |
| Conversational Voice AI | Real-time spoken dialogue with AI | OpenAI Realtime API, Hume AI |
| Sound Effect Generation | Generate custom sound effects from text | ElevenLabs Sound Effects, Adobe Project Sound |
| AI Dubbing | Automatically dub video into other languages | ElevenLabs Dubbing, HeyGen |
| Capability | Description | Key Tools |
|---|---|---|
| Text-to-Music | Generate songs or tracks from text descriptions | Suno, Udio, MusicLM (Google) |
| Instrumental Generation | Generate backing tracks and instrumentals | Soundraw, Boomy, Beatoven.ai |
| Music Continuation | Extend or complete an existing musical piece | Meta MusicGen, AudioCraft |
| Stem Separation | Isolate vocals, drums, bass from mixed audio | Lalal.ai, Spleeter (Meta) |
| Lyrics Generation | Write song lyrics to a style or theme | Suno, Udio, ChatGPT |
| AI Mastering | Automatically master audio tracks | LANDR, Dolby.io AI Mastering |
| Capability | Description | Key Tools |
|---|---|---|
| Code Autocomplete | Real-time inline code suggestions | GitHub Copilot, Cursor, Tabnine |
| Code Generation from Spec | Generate functions/classes from natural language | ChatGPT, Claude, Gemini |
| Code Explanation | Explain what a code snippet does | GitHub Copilot Chat, Cursor |
| Code Debugging | Identify and fix bugs | Claude Code, Cursor, Devin |
| Code Refactoring | Improve code quality and structure | GitHub Copilot, Windsurf |
| Test Generation | Automatically write unit and integration tests | GitHub Copilot, Codium AI |
| Documentation Generation | Write docs, comments, and READMEs | Mintlify, GitHub Copilot |
| Full Application Generation | Build entire apps from prompts | Lovable, Bolt.new, v0 (Vercel) |
| Capability | Description | Key Tools |
|---|---|---|
| Text-to-3D | Generate 3D objects from text descriptions | DreamFusion, Magic3D, Shap-E |
| Image-to-3D | Convert 2D images to 3D models | Luma AI, TripoSR, Zero-1-to-3 |
| 3D Scene Generation | Generate full 3D environments | NVIDIA GET3D, SyncDreamer |
| Avatar / Character Generation | Generate 3D digital humans | Ready Player Me, Reallusion AI |
| CAD / Blueprint Generation | Engineering and architectural designs | Maket, ArkoAI, Autodesk AI |
| NeRF Scene Capture | Reconstruct 3D from 2D images | Luma AI NeRF, Instant-NGP (NVIDIA) |
| Capability | Description | Key Tools |
|---|---|---|
| Tabular Data Synthesis | Create realistic synthetic datasets | Gretel, Mostly AI, CTGAN |
| Text Data Augmentation | Generate training examples for NLP | Gretel, DataIQ |
| Synthetic Image Data | Generate labelled images for CV training | Rendered.ai, NVIDIA Omniverse Replicator |
| Privacy-Safe Data | Generate GDPR/HIPAA-safe synthetic records | Mostly AI, Syntho, Tonic.ai |
| Capability | Description | Key Tools |
|---|---|---|
| Drug Candidate Generation | Design novel drug molecules | Insilico Medicine, Atomwise, Schrödinger |
| Protein Structure Generation | Design novel proteins with target properties | ProteinMPNN, RFdiffusion |
| Material Design | Generate novel material structures | GNoME (Google), MatGen |
| DNA / Gene Sequence Generation | Design synthetic genetic sequences | Nucleotide Transformer, Evo |
| System | What It Combines | Examples |
|---|---|---|
| Text + Image | Generate text and images together | GPT-4o, Gemini 1.5 Pro |
| Text + Audio | Generate text and voice simultaneously | GPT-4o Realtime, Gemini Live |
| Text + Video + Audio | Full audio-visual content generation | Sora with audio, Veo 3 |
| Text + Code + Execution | Generate, run, and reason over code | ChatGPT Code Interpreter |
| Any-to-Any | Process and generate any modality | GPT-4o, Gemini 2.0, Grok |
The fundamental model architectures that power generative AI across all modalities.
The dominant architecture underlying almost all modern Generative AI.
| Aspect | Detail |
|---|---|
| Introduced | "Attention Is All You Need" — Vaswani et al., Google, 2017 |
| Core Mechanism | Self-attention: every token attends to every other token simultaneously |
| Why It Dominates | Parallelisable (fast to train), scales with data and parameters, captures long-range dependencies |
| Used For | LLMs, text-to-image, text-to-video, multimodal models |
| Variants | Encoder-only (BERT), Decoder-only (GPT), Encoder-Decoder (T5, BART) |
Key Transformer Innovations:
| Innovation | What It Enables |
|---|---|
| Scaled Dot-Product Attention | Core mechanism for token relationship modelling |
| Multi-Head Attention | Attends to multiple aspects of relationships simultaneously |
| Positional Encoding | Injects sequence order information |
| Flash Attention | Memory-efficient attention; enables longer context windows |
| Mixture of Experts (MoE) | Activates only a subset of parameters per token; enables massive models efficiently |
| Rotary Position Embedding (RoPE) | Better handling of long-context documents |
| Group Query Attention (GQA) | Reduces memory footprint during inference |
The state-of-the-art architecture for image, video, and audio generation.
| Aspect | Detail |
|---|---|
| Core Mechanism | Learns to reverse a gradual noise-adding process to recover clean data |
| Forward Process | Add Gaussian noise to training data over T steps until it becomes pure noise |
| Reverse Process | Neural network learns to predict and remove noise at each step |
| Why It Dominates Images | Produces high-quality, diverse, controllable outputs; better than GANs at scale |
| Used For | Text-to-image, text-to-video, audio synthesis, molecule generation |
Key Diffusion Variants:
| Variant | Description | Examples |
|---|---|---|
| DDPM | Denoising Diffusion Probabilistic Models — original formulation | Ho et al. 2020 |
| DDIM | Deterministic sampling; much faster inference | Enabled faster generation |
| Latent Diffusion (LDM) | Operates in compressed latent space; efficient | Stable Diffusion, DALL·E 3 |
| Score-Based Models | Alternative formulation using score functions | Song et al. |
| Flow Matching | Faster training convergence; increasingly adopted | Stable Diffusion 3, Flux |
| Consistency Models | Single-step generation; very fast | OpenAI Consistency Models |
The pioneer architecture for synthetic image generation (2014–2022 era).
| Aspect | Detail |
|---|---|
| Introduced | Goodfellow et al., 2014 |
| Core Mechanism | Generator and Discriminator compete in a minimax game |
| Generator | Produces fake data trying to fool the discriminator |
| Discriminator | Tries to distinguish real data from generated fakes |
| Training Goal | Nash equilibrium where generator produces indistinguishable fakes |
| Current Status | Largely superseded by diffusion models for images; still used for specific tasks |
Key GAN Variants:
| Variant | Contribution |
|---|---|
| DCGAN | Deep convolutional GAN; stable training |
| StyleGAN / StyleGAN2 | High-fidelity face synthesis; style-based control |
| CycleGAN | Unpaired image-to-image translation |
| BigGAN | Large-scale class-conditional generation |
| Pix2Pix | Paired image translation |
| Aspect | Detail |
|---|---|
| Core Mechanism | Encodes inputs to a probabilistic latent space; decodes samples into outputs |
| Key Property | Smooth, continuous latent space enables interpolation between outputs |
| Used For | Tabular data generation, molecule generation, anomaly detection, latent space for diffusion |
| Advantage | Stable training; interpretable latent representations |
| Limitation | Blurrier outputs than diffusion or GANs for images |
| Architecture | What It Does |
|---|---|
| NeRF | Represents a 3D scene as a neural function; enables novel view synthesis |
| 3D Gaussian Splatting | Faster 3D scene representation; real-time rendering |
| Point-E / Shap-E | OpenAI's 3D object generation from text or images |
| DreamFusion | Text-to-3D using diffusion models as a prior |
| Architecture | What It Does | Used In |
|---|---|---|
| Neural TTS (Tacotron, VITS) | Text-to-spectrogram-to-audio pipeline | ElevenLabs, Amazon Polly |
| Vocoders (WaveNet, HiFi-GAN) | Neural audio synthesis from spectrograms | Speech synthesis |
| Codec Language Models (EnCodec) | Audio as discrete tokens; enables LLM-style generation | Meta AudioCraft, MusicGen |
| Diffusion for Audio (AudioLDM) | Latent diffusion applied to audio spectrograms | AudioCraft, Stable Audio |
| Architecture | Best For | Training Stability | Output Quality | Speed |
|---|---|---|---|---|
| Transformer (LLM) | Text, multimodal | High | Excellent | Fast inference |
| Diffusion Models | Images, video, audio | High | Excellent | Moderate (improving) |
| GANs | Faces, style transfer | Low (mode collapse) | Good | Very fast |
| VAEs | Latent spaces, tabular | High | Moderate | Fast |
| NeRF / 3DGS | 3D scenes | High | Excellent | Slow (improving) |
| Codec LMs | Audio, music | High | Very Good | Moderate |
The leading frontier foundation models driving the generative AI ecosystem in 2026.
| Model | Organisation | Key Highlights |
|---|---|---|
| GPT-4o / GPT-5 | OpenAI | Multimodal; leading general reasoning |
| Claude 4 Opus / Sonnet | Anthropic | #1 SWE-Bench 77.2%; 200K context |
| Gemini 3 Pro / Flash | Google DeepMind | 1M+ token context; strong video |
| Llama 4 Scout / Maverick | Meta | Open-weight; 10M-400B MoE |
| Mistral Large | Mistral AI | European frontier; multilingual |
| DeepSeek R1 / V3 | DeepSeek | MIT licensed; reasoning; trained <$6M |
| Model Family | Key Models | Strengths | Context Window |
|---|---|---|---|
| GPT-5 Series | GPT-5, GPT-5.1 | General reasoning, coding, multimodal | 1M tokens |
| o-Series (Reasoning) | o3, o3-mini, o4 | Deep multi-step reasoning, mathematics, science | 200K tokens |
| GPT-4o | GPT-4o, GPT-4o mini | Multimodal; real-time voice and vision | 128K tokens |
| Sora | Sora (video) | Text-to-video; cinematic quality | — |
| DALL·E | DALL·E 3 | Text-to-image; integrated in ChatGPT | — |
| Codex CLI | GPT-5.3-Codex | SWE-Bench Pro SOTA; coding agent | — |
| Whisper | Whisper v3 | Open-source ASR; 99-language support | — |
Products: ChatGPT (700M+ weekly active users), ChatGPT API, OpenAI Platform, Operator
| Model Family | Key Models | Strengths | Context Window |
|---|---|---|---|
| Claude 4 Series | Claude 4 Opus, Claude 4 Sonnet | Coding (#1 SWE-Bench 77.2%), reasoning, writing | 200K tokens |
| Claude 3.7 Series | Claude 3.7 Sonnet | Hybrid reasoning; computer use | 200K tokens |
| Claude 3.5 Series | Claude 3.5 Haiku | Fast, efficient; excellent for tasks | 200K tokens |
Products: Claude.ai, Claude API, Claude Code (CLI), Computer Use
| Model Family | Key Models | Strengths | Context Window |
|---|---|---|---|
| Gemini 3 | Gemini 3 Pro, Gemini 3 Flash | Multimodal reasoning; replaces Ultra tier | 1M tokens |
| Gemini 2.0 | Gemini 2.0 Flash, Flash Thinking | Speed + reasoning combination | 1M tokens |
| Gemini 1.5 | Gemini 1.5 Pro | Longest context; multimodal | 10M tokens |
| Imagen 3 | Imagen 3 | Google's flagship text-to-image | — |
| Veo 3 | Veo 3 | Text-to-video with native audio | — |
| Lyria | Lyria | Music generation | — |
Products: Gemini (app), Google AI Studio, Vertex AI, NotebookLM, Project Mariner, Jules
| Model | Strengths | Notes |
|---|---|---|
| Mistral Large | Reasoning, multilingual | European frontier model |
| Mistral Small | Fast, efficient | Cost-effective API |
| Mixtral 8x22B | MoE architecture; strong open-weight model | Community favourite |
| Codestral | Code generation | Developer-focused |
| Le Chat | Consumer product with agents | Gmail/Calendar integration |
| Model | Strengths | Notes |
|---|---|---|
| Grok 3 | Real-time X/Twitter data access; strong reasoning | Integrated into X platform |
| Grok 3 Mini | Efficient reasoning model | Cost-optimised |
| Aurora | Image generation | Integrated in X |
| Model | Strengths | Notes |
|---|---|---|
| Command R+ | Enterprise RAG; 128K context | 100+ language support |
| Embed 3 | Multilingual embeddings | Industry-leading retrieval |
| Rerank 3 | Semantic reranking for search | Production RAG pipelines |
| Model | Parameters | Highlights |
|---|---|---|
| Llama 4 Scout | 17B active (109B total MoE) | 10M context window; on-device capable |
| Llama 4 Maverick | 17B active (400B total MoE) | Outperforms GPT-4o on most benchmarks |
| Llama 4 Behemoth | 288B active (2T total MoE) | Frontier-class; in training |
| Llama 3.3 | 70B | Best open-weight 70B model |
| Llama 3.2 | 1B, 3B, 11B, 90B | On-device and multimodal variants |
License: Llama Community License (open-weight; usage restrictions at scale)
| Model | Parameters | Highlights |
|---|---|---|
| DeepSeek R1 | 671B (MoE) | Reasoning model; MIT licensed; trained for ~$6M |
| DeepSeek V3 | 671B (MoE) | Strong general capabilities; competitive with GPT-4o |
| DeepSeek R1-Distill | 1.5B–70B | Distilled reasoning models for deployment |
| DeepSeek Coder V2 | 236B | Top open-source coding model |
License: MIT (fully open; commercial use allowed)
| Model | Parameters | Highlights |
|---|---|---|
| Qwen 2.5 | 0.5B–72B | 300M+ downloads; 100K+ derivatives on HuggingFace |
| Qwen 2.5 Coder | 7B–32B | Top-tier open-source coding model |
| Qwen 2.5 VL | 7B–72B | Vision-language; strong multimodal |
| QwQ-32B | 32B | DeepSeek R1-level reasoning; open-weight |
License: Apache 2.0 (fully open; commercial use allowed)
| Model | Parameters | Highlights |
|---|---|---|
| Gemma 3 | 1B–27B | Runs on consumer hardware; 128K context |
| Gemma 3 27B | 27B | Outperforms Llama 3 70B on many tasks |
| CodeGemma | 7B | Code-specialised Gemma variant |
| PaliGemma | 3B | Vision-language model; open-weight |
License: Gemma Terms of Use (permissive; commercial use allowed)
| Model | Parameters | Highlights |
|---|---|---|
| Mistral 7B | 7B | Original; outperformed LLaMA 2 13B |
| Mixtral 8x7B | 47B (MoE) | Apache 2.0; strong multilingual reasoning |
| Mixtral 8x22B | 141B (MoE) | Frontier-competitive; fully open |
| Model | Organisation | Highlights |
|---|---|---|
| Falcon 180B | TII (UAE) | Apache 2.0; 180B parameter open model |
| Phi-3 / Phi-4 | Microsoft | Small but powerful; excellent on-device models |
| StarCoder 2 | HuggingFace / ServiceNow | Open-source code model |
| Orca 3 | Microsoft | Distilled reasoning; instruction-following |
| Yi 1.5 | 01.AI (China) | Strong bilingual (EN/ZH) model |
| Command R (Open) | Cohere | Apache 2.0; RAG-optimised |
The frameworks, databases, and tooling that power generative AI application development.
| Framework | Focus |
|---|---|
| LangChain | Chain-based LLM orchestration & agents |
| LlamaIndex | Data ingestion, indexing & RAG pipelines |
| Haystack | End-to-end NLP & retrieval pipelines |
| DSPy | Programmatic prompt optimisation |
| Semantic Kernel | Microsoft SDK for AI orchestration |
| Instructor | Structured output extraction from LLMs |
| Database | Focus |
|---|---|
| Pinecone | Managed vector search; serverless |
| Weaviate | Open-source; hybrid search |
| Qdrant | Rust-powered; high performance |
| Chroma | Lightweight; developer-friendly |
| Milvus / Zilliz | Scalable; GPU-accelerated |
| Tool | Deployment | Highlights |
|---|---|---|
| Pinecone | Cloud (AWS, Azure, GCP — serverless or pod-based) | Most widely used managed vector DB; serverless option |
| Weaviate | Open-Source / Cloud (self-host on any infra; Weaviate Cloud on AWS / GCP) | Hybrid search; multi-tenancy; GraphQL API |
| Qdrant | Open-Source / Cloud (self-host Docker/K8s; Qdrant Cloud on AWS, GCP, Azure) | Rust-based; fast; payload filtering |
| Chroma | Open-Source (local; any machine with Python 3.8+) | Lightweight; developer-friendly; local-first |
| Milvus / Zilliz | Open-Source / Cloud (self-host K8s; Zilliz Cloud on AWS, GCP, Azure) | High-scale; GPU-accelerated similarity search |
| pgvector | Open-Source (any PostgreSQL host; AWS RDS, Azure, GCP Cloud SQL, Supabase) | Add vector search to existing Postgres databases |
| Redis Vector | Open-Source / Cloud (self-host; Redis Cloud on AWS, GCP, Azure) | Low-latency vector search in Redis |
| Elasticsearch | Open-Source / Cloud (self-host; Elastic Cloud on AWS, GCP, Azure) | Hybrid keyword + vector search at scale |
| MongoDB Atlas Vector | Cloud (AWS, Azure, GCP — via MongoDB Atlas) | Integrated vector search in MongoDB |
| Azure AI Search | Cloud (Azure) | Microsoft's managed hybrid search service |
Real-world generative AI deployments transforming healthcare, legal, and scientific industries.
| Use Case | Description | Examples |
|---|---|---|
| Clinical Documentation | Auto-generate clinical notes from doctor-patient conversations | Nuance DAX, Suki AI, Abridge |
| Medical Coding & Billing | Extract ICD codes from clinical notes | Fathom Health, Optum AI |
| Radiology AI | Generate structured reports from medical images | Nuance PowerScribe AI, Aidoc |
| Drug Discovery | Generate and screen novel drug molecule candidates | Insilico Medicine, Schrödinger, Recursion |
| Patient Communication | Personalised patient education and discharge summaries | Nabla, Notable Health |
| Literature Summarisation | Summarise medical research papers | Consensus, Elicit, Semantic Scholar AI |
| Protein Design | Design novel therapeutic proteins | RFdiffusion, ProteinMPNN, AlphaFold 3 |
| Use Case | Description | Examples |
|---|---|---|
| Contract Review | Identify risks, obligations, and missing clauses | Harvey, Ironclad AI, Kira Systems |
| Contract Drafting | Generate first-draft contracts and legal documents | Harvey, Clio Duo |
| Legal Research | Search and summarise case law and statutes | Westlaw AI, LexisNexis AI, Casetext |
| Due Diligence | Review M&A documents at scale | Luminance, Kira, Relativity AI |
| Compliance Monitoring | Monitor communications for regulatory violations | Behavox, NICE Actimize AI |
| eDiscovery | Classify and review documents in litigation | Relativity aiR, Logikcull AI |
| Use Case | Description | Examples |
|---|---|---|
| Earnings Report Generation | Draft investor reports and commentary | Bloomberg AI, Morningstar AI |
| Financial Research | Summarise filings, news, market events | Bloomberg GPT, Kensho, AlphaSense |
| Fraud Detection Narrative | Explain fraud signals in natural language | Darktrace, Featurespace |
| Customer Communication | Generate personalised financial advice and alerts | Personetics, Kasisto |
| Code for Quant Models | Generate quantitative trading code | Kensho, internal AI at major banks |
| Regulatory Filing Drafts | Generate first drafts of regulatory submissions | Various internal LLM deployments |
| Use Case | Description | Examples |
|---|---|---|
| Personalised Tutoring | Adaptive 1-on-1 tutoring at scale | Khan Academy Khanmigo, Synthesis AI |
| Content Generation | Create lesson plans, quizzes, and worksheets | MagicSchool AI, Diffit, Curipod |
| Essay Feedback | Detailed writing feedback and coaching | Turnitin AI, EssayGrader, Writable |
| Language Learning | Conversational AI for language practice | Duolingo Max, Speak AI, Elsa |
| Accessibility | Auto-caption, translate, simplify complex content | Microsoft Accessibility AI |
| Use Case | Description | Examples |
|---|---|---|
| Ad Creative Generation | Generate images, copy, and video ads at scale | AdCreative.ai, Pencil, Persado |
| SEO Content | Generate optimised blog posts and landing pages | Jasper, Surfer SEO AI, Copy.ai |
| Personalisation at Scale | Customised email, SMS, push at individual level | Persado, Movable Ink, Brafton |
| Social Media Content | Generate posts, captions, hashtags | Lately AI, Publer AI, Hootsuite AI |
| Product Descriptions | E-commerce product copy at scale | Producti AI, Anyword |
| Campaign Analysis | Summarise campaign performance and suggest actions | Salesforce Einstein AI, HubSpot AI |
| Use Case | Description | Examples |
|---|---|---|
| Code Generation | Generate functions, classes, scripts | GitHub Copilot, Cursor, Claude Code |
| Code Review | Automated PR review; security and quality | GitHub Copilot for PRs, Coderabbit |
| Documentation | Auto-generate READMEs, docstrings, API docs | Mintlify, GitHub Copilot |
| Test Generation | Write unit, integration, and E2E tests | Codium AI, GitHub Copilot, Diffblue |
| Bug Fixing | Identify and patch bugs automatically | Devin, SWE-Agent, Claude Code |
| App Generation | Build full-stack apps from natural language | Lovable, Bolt.new, v0 |
| DevOps Automation | Generate IaC, CI/CD pipelines, Kubernetes configs | GitHub Copilot, Terraform AI, Pulumi AI |
| Use Case | Description | Examples |
|---|---|---|
| Scriptwriting | Generate screenplays, dialogue, story outlines | Sudowrite, ChatGPT, Claude |
| VFX & Visual Effects | AI-assisted compositing, upscaling, cleanup | Runway, Adobe Firefly Video, Topaz |
| Video Localisation | AI dubbing and lip-sync in 30+ languages | ElevenLabs Dubbing, HeyGen, Papercup |
| Music for Sync | Generate royalty-free music for film/TV | Suno, Udio, Soundraw |
| Game Asset Generation | Generate 2D/3D assets, textures, environments | Scenario.gg, Meshy, NVIDIA GET3D |
| Game NPC Dialogue | Dynamic AI-generated NPC conversations | Inworld AI, Convai, NVIDIA ACE |
| News Summarisation | Auto-summarise news for readers | Artifact, Perplexity, Briefing AI |
Key performance metrics and arena rankings for leading generative AI models.
| Benchmark | What It Tests | Notes |
|---|---|---|
| MMLU | 57-subject academic knowledge | Most widely cited general benchmark |
| GPQA Diamond | PhD-level science questions | Hard science reasoning ceiling test |
| HumanEval | Python coding; pass@1 accuracy | Standard code generation benchmark |
| SWE-Bench Verified | Real GitHub issue resolution | Gold standard for software engineering agents |
| MATH / AIME | Competition mathematics | Tests mathematical reasoning depth |
| BigBench Hard | 23 challenging reasoning tasks | Requires multi-step reasoning |
| HellaSwag | Commonsense NLI; sentence completion | Near-saturated by frontier models |
| TruthfulQA | Measures tendency to hallucinate | Truthfulness evaluation |
| IFEval | Instruction following evaluation | Tests prompt adherence |
| Arena (LMSYS Chatbot Arena) | Human preference via blind A/B battles | Most human-validated ranking |
| Benchmark | What It Tests |
|---|---|
| MMMU | Massive Multidisciplinary Multimodal Understanding |
| DocVQA | Document understanding and question answering |
| VQAv2 | Visual question answering |
| TextVQA | Reading text within images |
| EvalBench (Video) | Video understanding and temporal reasoning |
| AudioBench | Audio understanding and generation quality |
| CyberSecEval | Security and safety of code generation |
| MedQA / MedMCQA | Medical knowledge and clinical reasoning |
| Tool | Purpose |
|---|---|
| LangSmith | LLM evaluation pipelines; dataset management |
| Ragas | RAG pipeline evaluation; faithfulness, relevance |
| EleutherAI LM Eval Harness | Open-source benchmark runner |
| OpenAI Evals | Framework for creating custom evaluations |
| Braintrust | AI evaluation and testing platform |
| HumanEval+ | Extended coding benchmark with edge cases |
| MT-Bench | Multi-turn conversation quality evaluation |
Global generative AI market sizing, enterprise spend, and projected growth trajectory.
| Metric | Data |
|---|---|
| Global Gen AI Market (2024) | ~$67 billion |
| Projected Market (2030) | ~$1.3 trillion (Goldman Sachs) |
| CAGR | ~46% (2024–2030) |
| Enterprise AI Spend (2025) | >$200 billion globally |
| Metric | Data |
|---|---|
| ChatGPT Weekly Active Users (2025) | 700 million+ |
| GitHub Copilot Subscribers | 1.8 million+ developers |
| Midjourney Registered Users | 20 million+ |
| % Fortune 500 Using Gen AI | 92% (as of Q1 2025) |
| ElevenLabs API Calls / Month | 1 billion+ voice generations |
| HuggingFace Models | 1 million+ models; 500k+ datasets |
| Company | Funding Round | Amount |
|---|---|---|
| OpenAI | Cumulative | $57 billion+ |
| Anthropic | Series E | $7.3 billion (Amazon anchor) |
| xAI | Series B | $6 billion |
| Mistral AI | Series B | $1.1 billion |
| Cohere | Series D | $500 million |
| Stability AI | Various | $101 million |
| Runway | Series D | $308 million |
| ElevenLabs | Series B | $180 million |
| Perplexity | Series C | $500 million |
| Harvey AI | Series C | $300 million |
Critical limitations and failure modes to consider when deploying generative AI systems.
| Limitation | Description |
|---|---|
| Hallucination | Models confidently generate false information; a fundamental probabilistic limitation |
| Context Window Limits | Models lose coherence over very long documents; though windows now reach 10M tokens |
| Stale Knowledge | Training data has a cutoff date; models lack real-time awareness without tools |
| Reasoning Failures | Models can fail on seemingly simple logic, counting, and spatial reasoning tasks |
| Prompt Sensitivity | Small changes in phrasing can produce very different outputs |
| Lack of Causal Understanding | Models learn correlations, not causal mechanisms |
| Inconsistency | Same prompt can produce different outputs across runs |
| Benchmark Gaming | Models may be trained on benchmark data, inflating reported performance |
| Risk | Description |
|---|---|
| Disinformation & Deepfakes | Generate convincing fake news, images, audio, or video of real people |
| Phishing & Social Engineering | Generate highly personalised, convincing scam content at scale |
| Synthetic CSAM | Image/video generation models can be misused to generate illegal content |
| Automated Cyberattacks | Generate malware, exploit code, phishing pages |
| Mass Manipulation | Generate targeted propaganda at individual or societal scale |
| Identity Fraud | Voice cloning and face generation for impersonation |
| Academic Fraud | Ghostwriting essays, research papers, exam submissions |
| IP Violation | Models trained on copyrighted content; unclear legal status of outputs |
| Issue | Description |
|---|---|
| Training Data Bias | Models reflect societal biases present in internet-scale training data |
| Demographic Representation | Under/over-representation of genders, ethnicities, cultures in image generation |
| Language Bias | Models perform significantly better in English than other languages |
| Occupational Stereotyping | Associate certain professions with specific genders or ethnicities by default |
| Cultural Bias | Western-centric worldviews baked into responses and values |
| Amplification | AI may amplify existing human biases at scale |
| Concern | Data |
|---|---|
| Training Energy | Training GPT-4 estimated at 50+ GWh; equivalent to thousands of homes annually |
| Inference Energy | Each ChatGPT query uses ~10x more energy than a Google search |
| Water Cooling | Data centres use millions of litres of water for cooling |
| Carbon Footprint | Growing AI compute footprint contributes meaningfully to global carbon emissions |
| Hardware Waste | GPU production and replacement cycles create e-waste |
Mitigation Trends: Model efficiency improvements (MoE, quantisation), renewable energy commitments by hyperscalers, smaller specialised models for most tasks.
Explore how this system type connects to others in the AI landscape:
Agentic AI Conversational AI Multimodal Perception AI Recommendation / Retrieval AI Explainable AI (XAI)Key terms and concepts in generative AI — searchable and always accessible.
| Term | Definition |
|---|---|
| Token | The basic unit of text a model processes; approximately 0.75 words on average |
| Context Window | Maximum number of tokens a model can process in a single interaction |
| Temperature | Controls output randomness; higher = more creative, lower = more deterministic |
| Top-p (Nucleus Sampling) | Selects from the smallest set of tokens whose cumulative probability exceeds p |
| Top-k | Limits token selection to the k most probable next tokens |
| Hallucination | When a model generates confident but factually incorrect information |
| Grounding | Connecting AI output to verified real-world facts, documents, or data sources |
| Fine-Tuning | Adapting a pre-trained model to a specific domain or task using additional training |
| LoRA | Low-Rank Adaptation — efficient fine-tuning by adding small trainable matrices to a frozen model |
| QLoRA | Quantised LoRA — fine-tune a 4-bit quantised model using LoRA; consumer GPU friendly |
| RLHF | Reinforcement Learning from Human Feedback — alignment technique using human preference labels |
| DPO | Direct Preference Optimisation — alignment without a separate reward model |
| Constitutional AI | Anthropic's method of using AI feedback guided by a written constitution for alignment |
| RAG | Retrieval-Augmented Generation — augment model generation with retrieved external documents |
| Embeddings | Dense vector representations of text or images that capture semantic meaning |
| Vector Database | Database optimised for storing and querying high-dimensional embedding vectors |
| Latent Space | The compressed internal representation a model learns during training |
| Diffusion Model | Architecture that generates outputs by learning to reverse a noise-adding process |
| MoE (Mixture of Experts) | Architecture where only a subset of model parameters activates per token; enables scale efficiency |
| Inference | Running a trained model on new input data to produce an output |
| Foundation Model | A large model trained on broad data at scale, adaptable to many downstream tasks |
| Transfer Learning | Applying knowledge learned in one domain to a different but related domain |
| Multimodal | Involving or processing more than one type of data modality (e.g., text + image + audio) |
| Alignment | Ensuring AI behaviour matches human values, intentions, and safety requirements |
| Emergent Behaviour | Capabilities that arise in large models that were not explicitly trained for |
| Prompt Injection | Malicious input designed to override a model's system prompt or instructions |
| Guardrails | Rules or filters applied at runtime to prevent harmful or off-policy model outputs |
| System Prompt | Instructions given to a model before the conversation begins to shape its behaviour |
| SFT (Supervised Fine-Tuning) | Fine-tuning a model on labelled (input, output) demonstration pairs |
| PEFT | Parameter-Efficient Fine-Tuning — umbrella of methods that fine-tune a small subset of weights |
| Speculative Decoding | Use a small draft model to propose tokens; large model verifies; speeds up inference 2–3x |
| Flash Attention | Memory-efficient attention algorithm enabling longer context windows and faster training |
| KV Cache | Key-Value cache that stores prior token computations to speed up autoregressive generation |
| Watermarking | Embedding imperceptible signals in AI-generated content to enable provenance verification |
| C2PA | Coalition for Content Provenance and Authenticity — open standard for AI content labelling |
| Benchmark Saturation | When a model scores so highly on a benchmark it no longer meaningfully differentiates models |
| Distillation | Training a smaller "student" model to mimic the outputs of a larger "teacher" model |
| Quantisation | Reducing the numerical precision of model weights to reduce memory and increase inference speed |
Animation infographics for Generative AI — overview and full technology stack.
Animation overview · Generative AI · 2026
Animation tech stack · Hardware → Compute → Data → Frameworks → Orchestration → Serving → Application · 2026
Detailed reference content for regulation.
| Risk Tier | Description | Examples |
|---|---|---|
| Unacceptable Risk | Banned outright | Social scoring, subliminal manipulation, real-time biometric surveillance |
| High Risk | Strict requirements: conformity assessments, documentation, human oversight | CV screening, credit scoring, medical devices, critical infrastructure |
| Limited Risk | Transparency obligations | Chatbots must disclose AI nature; deepfakes must be labelled |
| Minimal Risk | No requirements | Spam filters, AI in video games, generative content tools |
| GPAI Models | General-purpose AI model requirements | Transparency, copyright compliance, safety testing for frontier models |
Key Provisions for Generative AI:
| Initiative | Status |
|---|---|
| Executive Order on AI (Oct 2023) | Broad safety, security, and trust directives for federal AI |
| NIST AI RMF | Voluntary AI Risk Management Framework; widely adopted |
| State-Level Laws | California (SB 1047 vetoed; AB 2013 passed), Colorado AI Act |
| FTC Guidance | Unfair and deceptive practices rules applied to AI |
| FDA AI/ML Guidance | Regulation of AI-enabled medical devices |
| Jurisdiction | Approach |
|---|---|
| EU | Risk-based regulation; legally binding; AI Act in force |
| UK | Pro-innovation; sector-led; no AI-specific law; AI Safety Institute |
| USA | Sector-specific; voluntary frameworks; state-level laws emerging |
| China | Algorithmic recommendation rules; deep synthesis (deepfake) rules; GPAI regulations |
| Canada | AIDA (Artificial Intelligence and Data Act) — in progress |
| Brazil | AI Bill — in legislative process |
| India | Advisory-based; no binding AI law yet |
| Japan | Principle-based; light-touch; "AI-friendly" positioning |
| Initiative | Details |
|---|---|
| C2PA (Content Credentials) | Open standard for content provenance metadata; Adobe, Microsoft, OpenAI, Google |
| SynthID (Google DeepMind) | Imperceptible watermarking for AI-generated images, audio, video, and text |
| DALL·E Watermarking | OpenAI embeds C2PA metadata in all DALL·E outputs |
| EU AI Act Requirement | AI-generated content must be labelled as such |
| Platform Policies | Meta, YouTube, TikTok all require disclosure of AI-generated content |
Detailed reference content for training.
| Technique | Description |
|---|---|
| Next Token Prediction | Train model to predict the next token given all prior tokens (GPT-style) |
| Masked Language Modelling | Randomly mask tokens; train to predict them (BERT-style) |
| Contrastive Learning | Train model to align related pairs (image + text in CLIP) |
| Denoising | Train model to reconstruct original data from corrupted versions |
| Mixture of Experts (MoE) | Route tokens to specialised sub-networks; scale efficiently |
| Method | Description | When to Use |
|---|---|---|
| Full Fine-Tuning | Update all model weights on domain data | Small models; maximum domain performance |
| LoRA (Low-Rank Adaptation) | Add trainable low-rank matrices; freeze base weights | Most fine-tuning scenarios; cost-efficient |
| QLoRA | Quantised LoRA; 4-bit base model + LoRA adapters | Consumer GPU fine-tuning (24GB VRAM) |
| Prefix Tuning | Prepend trainable tokens to input; keep model frozen | Style and tone adaptation |
| Adapter Layers | Insert small trainable modules between frozen layers | Multi-task adaptation |
| PEFT (Parameter-Efficient FT) | Umbrella of LoRA, adapters, prefix; HuggingFace library | All efficient fine-tuning |
| Instruction Fine-Tuning | Train on (instruction, response) pairs | Making models follow instructions |
| Technique | Description | Used In |
|---|---|---|
| RLHF | Human labellers rank outputs; train reward model; PPO optimise | ChatGPT, Claude, Gemini |
| RLAIF | Use AI instead of humans to generate preference labels | Constitutional AI (Anthropic) |
| DPO (Direct Preference Optimisation) | Train directly on preference pairs; no reward model needed | LLaMA 3, Mistral, most open models |
| Constitutional AI (CAI) | AI critiques and revises its own outputs against a constitution | Claude (Anthropic) |
| ORPO | Combines SFT and preference learning in one step | Efficient alignment |
| PPO | Proximal Policy Optimisation; core RL algorithm for RLHF | Original ChatGPT training |
| Method | Description | Benefit |
|---|---|---|
| INT8 Quantisation | 8-bit integer weights instead of 32-bit float | 4x memory reduction |
| INT4 / GPTQ | 4-bit quantisation; minimal quality loss | 8x memory reduction |
| GGUF (llama.cpp) | Format for running quantised models locally | CPU/GPU inference on consumer hardware |
| AWQ | Activation-aware weight quantisation; better quality | Deployment on edge devices |
| Speculative Decoding | Small draft model proposes tokens; large model verifies | 2-3x faster inference |
| Flash Attention 2/3 | Memory-efficient attention computation | Longer contexts; faster training |
| KV Cache | Cache key-value pairs from previous tokens | Faster multi-turn inference |
| Continuous Batching | Process multiple requests simultaneously | Higher throughput in serving |
Detailed reference content for enterprise.
| Platform | Provider | Key Capabilities |
|---|---|---|
| Azure OpenAI Service | Microsoft | GPT-5, o3, DALL·E, Whisper via Azure; enterprise SLAs |
| Google Vertex AI | Gemini, Imagen, PaLM; MLOps; Model Garden | |
| AWS Bedrock | Amazon | Claude, Llama, Titan, Mistral; multi-model; RAG |
| AWS SageMaker | Amazon | Custom model training, fine-tuning, deployment |
| IBM watsonx.ai | IBM | Granite models; enterprise governance; OpenScale |
| Oracle OCI AI | Oracle | Database-native AI; Cohere integration |
| Platform | Provider | Highlights |
|---|---|---|
| Microsoft Copilot 365 | Microsoft | AI across Word, Excel, Teams, Outlook, PowerPoint |
| Salesforce Einstein | Salesforce | CRM-native AI; Agentforce agents |
| ServiceNow AI | ServiceNow | IT, HR, and customer service workflow AI |
| Workday AI | Workday | HR, finance, and planning AI |
| SAP Joule | SAP | Copilot across SAP ERP ecosystem |
| Adobe Firefly Enterprise | Adobe | Brand-safe generative AI for creative workflows |
| Box AI | Box | Document intelligence; summarisation; Q&A |
| Slack AI | Salesforce | Thread summarisation; search; workflow AI |
| Zoom AI Companion | Zoom | Meeting summarisation; smart compose; coaching |
| Tool | Purpose |
|---|---|
| LiteLLM | Universal LLM proxy; route between 100+ models |
| Kong AI Gateway | Enterprise API gateway for LLM traffic |
| Portkey | AI gateway; fallbacks, retries, cost control |
| Martian | Intelligent LLM routing based on task type and cost |
| Not Diamond | Automatic best-model selection per query |
Detailed reference content for consumer tools.
| Product | Provider | Highlights |
|---|---|---|
| ChatGPT | OpenAI | 700M+ weekly users; GPT-5, o3; multimodal; tool use |
| Claude.ai | Anthropic | Claude 4 Opus/Sonnet; 200K context; best for writing and coding |
| Gemini | Gemini 2.0/3; integrates with Google Workspace | |
| Copilot | Microsoft | GPT-5 powered; integrated across Microsoft 365 |
| Le Chat | Mistral | European alternative; fast inference; Gmail integration |
| Grok | xAI | Real-time X/Twitter data; Grok 3 reasoning |
| Perplexity | Perplexity AI | Web-grounded answers; citations; research assistant |
| You.com | You.com | Search + AI assistant with app integrations |
| HuggingChat | HuggingFace | Open-source models; free; no login required |
| Product | Provider | Highlights |
|---|---|---|
| Midjourney | Midjourney | Most aesthetically refined; v7; subscription-based |
| DALL·E 3 | OpenAI | Integrated in ChatGPT; prompt adherence; inpainting |
| Stable Diffusion | Stability AI | Open-source; fully customisable; runs locally |
| Adobe Firefly | Adobe | Commercially safe; integrated in Photoshop/Illustrator |
| Imagen 3 | Google's highest-quality text-to-image model | |
| Ideogram | Ideogram | Excellent text rendering within images |
| Flux | Black Forest Labs | Open-weight; state-of-the-art quality; fast |
| Leonardo.ai | Leonardo | Game asset and concept art generation |
| Canva AI | Canva | Magic Generate; design-integrated image generation |
| Product | Provider | Highlights |
|---|---|---|
| Sora | OpenAI | 1080p; up to 60s; cinematic quality |
| Veo 3 | Native audio generation; YouTube integration | |
| Runway Gen-3 Alpha | Runway | Professional VFX-grade; image-to-video |
| Kling 2.0 | Kuaishou | High-fidelity motion; strong physics simulation |
| Pika 2.0 | Pika Labs | Fast generation; scene modification features |
| HeyGen | HeyGen | Avatar video; AI dubbing; lip sync |
| Synthesia | Synthesia | Enterprise avatar video for training and comms |
| Luma Dream Machine | Luma AI | Fast; smooth motion; 3D-grounded generation |
| Product | Provider | Highlights |
|---|---|---|
| ElevenLabs | ElevenLabs | Best-in-class TTS, voice cloning, dubbing |
| OpenAI Voice | OpenAI | Natural realtime conversational voice in ChatGPT |
| Suno | Suno | Full song generation from text; v4 model |
| Udio | Udio | Music generation; style control; 3-minute tracks |
| Descript | Descript | AI podcast and video editing; voice cloning |
| Adobe Podcast | Adobe | AI audio enhancement and transcription |
| Murf | Murf | Professional TTS for presentations and e-learning |
| Play.ht | Play.ht | TTS API; voice cloning; 900+ voices |
| Product | Provider | Highlights |
|---|---|---|
| Notion AI | Notion | Integrated writing assistant; summarisation; Q&A |
| Jasper | Jasper | Marketing copy; brand voice training |
| Copy.ai | Copy.ai | Marketing and sales content generation |
| Grammarly | Grammarly | AI writing assistant; rewriting; tone adjustment |
| Writesonic | Writesonic | Blog posts, ads, product descriptions |
| Sudowrite | Sudowrite | AI for fiction and creative writing |
| Hemingway Editor AI | Hemingway | Clarity and readability scoring with suggestions |
| Product | Provider | Highlights |
|---|---|---|
| GitHub Copilot | GitHub / OpenAI | #1 coding assistant; GPT-4o + Claude 4 |
| Cursor | Cursor | AI-native IDE; multi-file editing; composer mode |
| Windsurf | Codeium | Agent-native IDE; Cascade multi-file agent |
| Bolt.new | StackBlitz | Build full-stack web apps in browser from prompts |
| Lovable | Lovable | Generate full React apps from natural language |
| v0 | Vercel | Generate and edit UI components with AI |
| Replit Agent | Replit | Build and deploy apps in natural language |
| Claude Code | Anthropic | CLI coding agent; top SWE-Bench performance |
| Devin | Cognition | Autonomous software engineering agent |
Detailed reference content for overview.
Generative AI is the branch of artificial intelligence focused on systems that can produce new content — text, images, video, audio, music, code, 3D models, molecules, and more — that did not exist before the generation event.
| Dimension | Detail |
|---|---|
| Core Capability | Creates — does not just classify, predict, retrieve, or respond with pre-written text |
| How It Learns | Learns the statistical patterns and distributions of massive training datasets |
| What It Produces | Novel outputs that are plausible, coherent, and contextually appropriate |
| Key Differentiator | Output is generative, not extractive — the model synthesises, it does not copy |
| AI Type | What It Does | Example |
|---|---|---|
| Generative AI | Creates new original content from learned distributions | Write an essay, generate an image, synthesise a video |
| Agentic AI | Pursues goals autonomously using tools, memory, and planning | Research agent, coding agent, autonomous workflow |
| Analytical AI | Extracts insights and explanations from existing data | Dashboard, root-cause analysis, anomaly detection |
| Autonomous AI (Non-Agentic) | Operates independently within fixed boundaries without human input | Autopilot, auto-scaling, algorithmic trading |
| Bayesian / Probabilistic AI | Reasons under uncertainty using probability distributions | Clinical trial analysis, A/B testing, risk modelling |
| Cognitive / Neuro-Symbolic AI | Combines neural learning with symbolic reasoning | LLM + knowledge graph, physics-informed neural net |
| Conversational AI | Manages multi-turn dialogue between humans and machines | Customer service chatbot, voice assistant |
| Evolutionary / Genetic AI | Optimises solutions through population-based search inspired by natural selection | Neural architecture search, logistics scheduling |
| Explainable AI (XAI) | Makes AI decisions understandable to humans | SHAP explanations, LIME, Grad-CAM |
| Multimodal Perception AI | Fuses vision, language, audio, and other modalities | GPT-4o processing image + text, AV sensor fusion |
| Optimisation / Operations Research AI | Finds optimal solutions to constrained mathematical problems | Vehicle routing, supply chain planning, scheduling |
| Physical / Embodied AI | Acts in the physical world through sensors and actuators | Autonomous vehicle, robot arm, drone |
| Predictive / Discriminative AI | Classifies or forecasts from historical patterns | Spam filter, credit score, churn prediction |
| Privacy-Preserving AI | Trains and runs AI without exposing raw data | Federated hospital models, differential privacy |
| Reactive AI | Responds to current inputs with no memory or learning | Chess engine move evaluation, thermostat |
| Recommendation / Retrieval AI | Surfaces relevant items from large catalogues based on user signals | Netflix suggestions, Google Search, Spotify playlists |
| Reinforcement Learning AI | Learns optimal behaviour from reward signals via trial and error | AlphaGo, robotic locomotion, RLHF |
| Scientific / Simulation AI | Solves scientific problems and models physical systems | AlphaFold, climate simulation, molecular dynamics |
| Symbolic / Rule-Based AI | Reasons over explicit rules and knowledge to derive conclusions | Medical expert system, legal reasoning engine |