A comprehensive interactive exploration of Scientific AI — the discovery pipeline, 8-layer stack, AI for protein folding, drug discovery, climate, genomics, digital twins, benchmarks, market data, and more.
~64 min read · Interactive ReferenceThe six-step cycle that Scientific AI follows — from hypothesis to validated insight, iterating continuously.
Scientific AI follows a structured pipeline from scientific question to validated discovery:
┌──────────────────────────────────────────────────────────────────────┐
│ SCIENTIFIC AI PIPELINE │
│ │
│ 1. FORMULATE 2. DATA & PRIOR 3. MODEL DESIGN │
│ ───────────── ────────────── ────────────── │
│ Define the Gather experimental Choose architecture │
│ scientific data, simulations, informed by domain │
│ question or and physical laws; physics: PINNs, GNNs, │
│ hypothesis encode constraints neural operators, etc. │
│ │
│ 4. TRAIN 5. PREDICT / 6. VALIDATE │
│ ───────────── SIMULATE ────────────── │
│ Train model on ────────────── Compare predictions │
│ scientific data Generate predictions against experimental │
│ with physics or run simulations results; peer review; │
│ constraints at accelerated speed domain expert evaluation │
│ │
│ ──────── FEEDBACK LOOP: EXPERIMENTAL VALIDATION → REFINEMENT ─── │
└──────────────────────────────────────────────────────────────────────┘
| Step | What Happens |
|---|---|
| Problem Formulation | Define the scientific question: predict a protein's 3D structure, discover a stable material, forecast weather 10 days ahead |
| Data Collection | Gather experimental data (crystallography, assays, sensor readings), simulation outputs, and published literature |
| Physics / Domain Encoding | Encode known physical laws, conservation principles, symmetries, and boundary conditions as inductive biases or constraints |
| Architecture Selection | Choose model architecture suited to the domain: GNNs for molecular graphs, PINNs for PDEs, neural operators for simulations |
| Training | Train on scientific datasets with physics-informed loss functions, equivariance constraints, and domain-specific augmentation |
| Prediction / Simulation | Generate scientific predictions — molecular structures, material properties, weather fields — at speeds far exceeding traditional simulation |
| Experimental Validation | Compare AI predictions against wet-lab experiments, physical measurements, or high-fidelity numerical simulations |
| Iterative Refinement | Use experimental results to improve the model; active learning selects the most informative new experiments |
| Publication & Deployment | Validated models are published, open-sourced, or deployed for production scientific workflows |
| Principle | What It Means |
|---|---|
| Physics-Informed Learning | Embed known physical laws (conservation, symmetry, boundary conditions) directly into the model architecture or loss function |
| Equivariance / Invariance | Ensure model predictions respect physical symmetries — e.g., rotating a molecule should not change its predicted energy |
| Data Efficiency | Scientific data is often scarce and expensive; models must learn effectively from limited experimental data |
| Uncertainty Quantification | Scientific predictions must include confidence intervals and uncertainty estimates — not just point predictions |
| Transferability | Models trained on one set of molecules/materials/conditions should generalise to unseen ones |
| Interpretability | Scientists need to understand why the model made a prediction — not just what it predicted |
| Reproducibility | Results must be reproducible; models and data must be openly documented and shareable |
AlphaFold2 predicted the 3D structure of virtually every known protein (~200 million) in under a year.
Physics-informed neural networks (PINNs) can solve PDEs 1,000x faster than traditional numerical methods.
AI-driven climate models have reduced simulation time from months to hours for certain scenarios.
Test your understanding — select the best answer for each question.
Q1. What did AlphaFold2 solve?
Q2. What are Physics-Informed Neural Networks (PINNs)?
Q3. What is a digital twin?
The complete Scientific AI architecture, from foundational problem definition to peer-reviewed publication.
Scientific papers, curated databases, model zoos, and reproducibility artifacts. Includes preprint servers (arXiv, bioRxiv), structured databases (PDB, ChEMBL), trained model checkpoints, and community benchmarks that feed the next cycle of discovery.
Error bars, confidence intervals, ablation studies, and physical consistency checks. Ensures predictions respect known laws (energy conservation, symmetry), quantifies epistemic and aleatoric uncertainty, and catches out-of-distribution failures before deployment.
Forward simulation, virtual screening, and surrogate inference. The model generates predictions — protein 3D structures, molecular binding affinities, weather forecasts, or material properties — orders of magnitude faster than traditional simulation.
Training GNNs, transformers, diffusion models, neural operators, and equivariant neural networks on scientific data. Incorporates physics-informed loss functions, multi-task objectives, and curriculum learning strategies tailored to scientific domains.
Encoding scientific entities into ML-friendly formats: molecular graphs (atoms as nodes, bonds as edges), protein sequences (amino acid tokens), 3D coordinates, voxel grids, point clouds, and spectral representations. The choice of representation profoundly shapes model capability.
Experimental databases (PDB for proteins, ChEMBL for bioactivity, Materials Project for crystals), simulation-generated datasets (DFT calculations, MD trajectories), and synthetic data augmentation. Data quality and coverage are critical bottlenecks.
Physical laws, symmetry constraints, conservation equations, and domain-specific priors. Newtonian mechanics, quantum mechanics, thermodynamics, and Maxwell's equations serve as inductive biases that constrain the solution space and improve generalisation.
The foundational question: predict a drug target's 3D structure, discover a new battery material, project climate change under emissions scenarios, determine a protein's function from sequence, or prove a mathematical theorem. Everything else is built to answer this.
| Layer | What It Covers |
|---|---|
| 1. Scientific Data Sources | Experimental databases (PDB, ChEMBL, Materials Project), simulation archives, sensor data, literature, genomic databases |
| 2. Data Processing & Representation | Molecular graphs, point clouds, voxel grids, spectral representations, SMILES/SELFIES encoding, sequence tokenisation |
| 3. Physics & Domain Encoding | Physical laws, conservation constraints, symmetry groups, boundary conditions, thermodynamic rules, domain ontologies |
| 4. Model Architecture | PINNs, GNNs, neural operators, equivariant networks, diffusion models, foundation models, transformers |
| 5. Training & Optimisation | Physics-informed losses, multi-task training, active learning, transfer learning, self-supervised pre-training |
| 6. Prediction & Simulation | Inference engines, uncertainty quantification, ensemble methods, real-time simulation, inverse design |
| 7. Validation & Experiment | Experimental validation, benchmarking against numerical solvers, wet-lab verification, peer review, ablation studies |
| 8. Deployment & Integration | Lab automation integration, digital twin platforms, simulation-as-a-service, scientific workflow orchestration |
AlphaFold 2/3, ESMFold, RoseTTAFold — predicting 3D protein structures from amino acid sequences. AlphaFold has predicted 200M+ structures, revolutionising biology. Enables drug target identification, enzyme engineering, and understanding of disease mechanisms.
Molecular generation, ADMET property prediction, molecular docking, and lead optimisation. Companies like Recursion and Insilico Medicine use AI to reduce drug candidate identification from years to weeks. Generative models design novel molecules with desired properties.
Property prediction, crystal structure generation, and inverse design. Google DeepMind's GNoME discovered 2.2 million new stable materials (800× previously known). Applications span battery electrodes, solar cells, catalysts, and superconductor candidates.
ML weather forecasting (GraphCast, Pangu-Weather) matches traditional NWP models at a fraction of compute. Carbon cycle modelling, ocean dynamics, wildfire prediction, and climate projection under emissions scenarios. Critical for adaptation planning.
Variant effect prediction, gene expression modelling, single-cell analysis. Models like Evo (2.7B parameters over DNA), scGPT, and Enformer predict regulatory effects from sequence. Enables precision medicine and understanding of genetic disease.
AlphaProof, FunSearch — AI systems that prove theorems, discover algorithms, and solve combinatorial problems. Integration with formal proof assistants (Lean 4, Coq) enables verified mathematics. IMO-level problem solving achieved in 2025.
Galaxy formation simulation, gravitational wave detection, dark matter mapping, and exoplanet discovery. ML accelerates N-body simulations by 1000×, classifies transient events in real time, and reconstructs cosmic structure from survey data.
Physics-informed virtual replicas of physical systems, continuously updated with real sensor data. Siemens Xcelerator, NVIDIA Omniverse — used for manufacturing optimisation, predictive maintenance, smart cities, and aerospace design validation.
AI systems that predict the 3D structures of biological macromolecules — proteins, nucleic acids, and their complexes.
| Aspect | Detail |
|---|---|
| Core Problem | Predicting how a linear sequence of amino acids folds into a 3D structure that determines biological function |
| Why It Matters | Protein structure determines function; knowing structure accelerates drug design, enzyme engineering, and disease understanding |
| Key Breakthrough | AlphaFold 2 (2020) solved the 50-year protein folding problem; AlphaFold 3 (2024) extended to complexes |
| Techniques | Evoformer (attention on MSAs + pair representations), SE(3)-equivariant structure modules, diffusion-based generation |
| Key Tools | AlphaFold 3, ESMFold, RoseTTAFold, OpenFold, ColabFold, RFdiffusion (protein design) |
AI systems that design, screen, and optimise drug candidates computationally — reducing the time and cost of bringing a drug to market.
| Aspect | Detail |
|---|---|
| Core Problem | Finding molecules that bind to a target protein, have drug-like properties, are synthesisable, and are safe — a multi-objective search in vast chemical space |
| Traditional Pipeline | 10–15 years, $1–2 billion to bring one drug to market; >90% failure rate in clinical trials |
| AI-Accelerated Pipeline | AI compresses candidate identification from years to weeks; reduces wet-lab experiments by pre-screening computationally |
| Techniques | Virtual screening, molecular docking (DiffDock), de novo molecule generation, ADMET prediction, retrosynthesis planning |
| Key Tools | Insilico Medicine, Atomwise, Recursion, Schrödinger, BenevolentAI, Exscientia, Relay Therapeutics |
AI systems that discover new materials with desired properties — predicting stability, conductivity, strength, and other material characteristics.
| Aspect | Detail |
|---|---|
| Core Problem | Searching the vast space of possible elemental combinations and crystal structures for materials with target properties |
| Why It Matters | New materials drive advances in batteries, semiconductors, superconductors, catalysts, and construction |
| Key Breakthrough | GNoME (Google DeepMind, 2023) discovered 2.2 million new stable crystal structures — more than all prior human discoveries combined |
| Techniques | GNNs on crystal graphs, formation energy prediction, stability classification, generative crystal structure design |
| Key Tools | GNoME, Materials Project, AFLOW, JARVIS (NIST), Open Catalyst, M3GNet, CHGNet, MatterGen |
AI systems that forecast weather and model climate systems — achieving unprecedented speed and accuracy compared to traditional numerical weather prediction.
| Aspect | Detail |
|---|---|
| Core Problem | Solving the Navier-Stokes equations governing atmospheric fluid dynamics at global scale — computationally expensive for traditional numerical solvers |
| Key Breakthrough | GraphCast (DeepMind, 2023) produced more accurate 10-day global forecasts than ECMWF's HRES model in under 1 minute on a single TPU |
| Why It Matters | Faster, cheaper forecasting saves lives (extreme weather warnings), optimises energy grids, and improves agricultural planning |
| Techniques | GNNs on mesh grids, neural operators (FNO), vision transformers on atmospheric fields, ensemble probabilistic forecasting |
| Key Tools | GraphCast, GenCast, Pangu-Weather, FourCastNet, ClimaX, Aurora, NVIDIA Earth-2 |
AI systems that analyse DNA, RNA, and protein sequences — predicting gene function, variant effects, and regulatory elements.
| Aspect | Detail |
|---|---|
| Core Problem | Understanding the functional implications of the 3 billion base pairs in the human genome and their variants |
| Why It Matters | Enables personalised medicine, disease risk prediction, gene therapy design, and agricultural biotechnology |
| Techniques | DNA/RNA language models, variant effect prediction, gene expression modelling, CRISPR guide design |
| Key Tools | DeepVariant, Enformer, Nucleotide Transformer, Evo, scGPT (single-cell), DNABERT-2 |
AI systems that prove mathematical theorems, discover conjectures, and solve formal reasoning problems.
| Aspect | Detail |
|---|---|
| Core Problem | Formal mathematical reasoning — proving theorems in proof assistants (Lean, Isabelle, Coq) and solving competition-level math problems |
| Key Breakthrough | AlphaProof (DeepMind, 2024) solved IMO competition problems at silver medal level using formal proof search |
| Techniques | Neural-guided proof search, RL for tactic selection, LLM-generated proof sketches, formal verification |
| Key Tools | AlphaProof, AlphaGeometry 2, Lean 4 + AI, LEGO-Prover, DeepSeek-Prover, Minif2f benchmark |
AI systems that analyse telescope data, particle collider outputs, and cosmological simulations at scales impossible for human analysis.
| Aspect | Detail |
|---|---|
| Core Problem | Processing petabytes of observational data from telescopes, satellites, and particle accelerators to detect rare events and discover new physics |
| Why It Matters | Enables detection of gravitational waves, exoplanet discovery, dark matter searches, and new particle identification |
| Techniques | CNNs for image classification, GNNs for particle tracking, anomaly detection in detector data, simulation-based inference |
| Key Tools | LIGO AI (gravitational waves), Euclid AI (cosmology), CERN ML (particle physics), Rubin Observatory pipeline |
AI systems that create real-time virtual replicas of physical systems — enabling simulation, monitoring, and optimisation of infrastructure, factories, and products.
| Aspect | Detail |
|---|---|
| Core Problem | Traditional engineering simulations (CFD, FEA, multi-body dynamics) are too slow for real-time monitoring and iterative design exploration |
| How AI Helps | Neural surrogate models replace or accelerate expensive simulations; digital twins combine sensor data with simulation for real-time state estimation |
| Techniques | Neural surrogates, reduced-order models, physics-informed ML, real-time sensor fusion, differentiable simulation |
| Key Tools | NVIDIA Omniverse, Siemens Xcelerator, Ansys SimAI, Azure Digital Twins, GE Digital Twins |
Message-passing on molecular and material graphs where atoms are nodes and bonds are edges. Key models: SchNet (continuous-filter convolutions), DimeNet (directional message passing), EGNN (equivariant updates). Foundation of molecular property prediction.
E(3)-equivariant architectures that respect rotation, translation, and reflection symmetries of 3D space. SE(3)-Transformers, MACE, NequIP — produce physically consistent predictions regardless of molecular orientation. Essential for force fields and 3D generation.
Learn mappings between function spaces to solve PDEs. The Fourier Neural Operator (FNO) learns in spectral space for weather, fluid dynamics, and material stress. Orders of magnitude faster than finite-element solvers for forward simulation.
Generate molecules, proteins, and materials by learning to reverse a noise process. RFDiffusion designs novel protein structures, EDM generates 3D molecules. Enables exploration of vast chemical and structural spaces with physical constraints.
Embed physical laws (PDEs, conservation equations) directly in the loss function. Solve differential equations without mesh generation, enforce boundary conditions, and blend sparse experimental data with known physics. Used in fluid dynamics, heat transfer, and structural mechanics.
Protein language models (ESM-2, 15B parameters), genomic foundation models (Evo, 2.7B), and chemical transformers. Pre-trained on massive biological/chemical corpora, fine-tuned for downstream tasks: sequence-to-function, property prediction, variant effect.
Operate on manifolds, meshes, point clouds, and fiber bundles. Gauge equivariant CNNs, mesh transformers, and surface networks process non-Euclidean data from molecular surfaces, protein interfaces, and geographic terrains with principled geometric priors.
The foundational architecture for embedding physical laws directly into neural network training.
| Aspect | Detail |
|---|---|
| Core Mechanism | Train a neural network to satisfy a Partial Differential Equation (PDE) by adding the PDE residual as a term in the loss function |
| How It Works | Network predicts the solution field; the physics loss penalises violations of governing equations at collocation points |
| Key Advantage | Can solve PDEs without mesh generation or numerical discretisation; works with sparse or noisy data |
| Limitations | Training can be slow to converge for stiff or complex PDEs; spectral bias towards smooth solutions |
| Used For | Fluid dynamics, heat transfer, structural mechanics, electromagnetic fields, geophysics |
PINN Loss Function Structure:
| Loss Component | What It Penalises |
|---|---|
| Data Loss | Mismatch between network predictions and observed experimental / simulation data |
| PDE Residual Loss | Violation of the governing partial differential equations at sampled collocation points |
| Boundary Condition Loss | Violation of prescribed boundary conditions (Dirichlet, Neumann, periodic) |
| Initial Condition Loss | Violation of prescribed initial conditions for time-dependent problems |
| Regularisation Loss | Standard weight regularisation to prevent overfitting |
The dominant architecture for molecular, materials, and relational scientific data.
| Aspect | Detail |
|---|---|
| Core Mechanism | Represent scientific entities (atoms, residues, particles) as nodes and their interactions (bonds, forces) as edges in a graph |
| How It Works | Message-passing layers propagate information between connected nodes; each node updates its representation based on its neighbours |
| Key Advantage | Naturally handles variable-sized, irregular structures; respects the relational structure of molecules, crystals, and proteins |
| Used For | Molecular property prediction, protein structure, materials discovery, particle physics, weather forecasting |
Key GNN Architectures for Science:
| Architecture | Description | Key Application |
|---|---|---|
| SchNet | Continuous-filter convolutional layers on atomic distances | Molecular energy and force prediction |
| DimeNet / DimeNet++ | Directional message passing using bond angles and distances | Molecular property prediction |
| PaiNN | Equivariant message passing with vector features | Forces and energy with rotational equivariance |
| EGNN | Equivariant Graph Neural Networks; coordinate-aware | Molecular dynamics, protein modelling |
| NequIP | E(3)-equivariant neural network interatomic potentials | High-accuracy molecular dynamics |
| MACE | Multi-body equivariant message passing | Materials science, catalysis |
| GemNet | Geometric message passing with triplet interactions | Molecular energy surfaces at scale |
| Graphormer | Transformer applied to graph-structured data | Molecular property prediction (OGB benchmarks) |
Learn mappings between function spaces — enabling AI to solve entire families of PDEs, not just individual instances.
| Aspect | Detail |
|---|---|
| Core Mechanism | Learn the operator that maps input functions (initial/boundary conditions, forcing terms) to solution functions |
| Key Advantage | Once trained, can solve a new PDE instance in a single forward pass — orders of magnitude faster than traditional solvers |
| Difference from PINNs | PINNs solve a single PDE instance; neural operators learn the solution operator for a family of PDEs |
| Used For | Weather forecasting, fluid simulation, structural analysis, climate modelling, engineering design |
Key Neural Operator Architectures:
| Architecture | Description | Key Application |
|---|---|---|
| Fourier Neural Operator (FNO) | Learns in Fourier space; efficient for periodic and regular domains | Fluid dynamics, weather, turbulence |
| DeepONet | Branch-trunk architecture; branch encodes input function, trunk encodes query point | General PDE solving; multi-physics |
| U-NO | U-Net style neural operator with skip connections | High-resolution PDE solutions |
| GNOT | General Neural Operator Transformer | Multi-physics problems with irregular geometries |
| Factorised FNO (F-FNO) | Memory-efficient FNO with factorised spectral layers | Large-scale 3D simulation |
| Geo-FNO | FNO extended to non-uniform, irregular geometries | Real-world engineering simulations |
Architectures that respect physical symmetries by construction — ensuring predictions are consistent under rotations, translations, and reflections.
| Aspect | Detail |
|---|---|
| Core Mechanism | Network layers are mathematically constrained to be equivariant under the symmetry group of the problem (e.g., SE(3), E(3), SO(3)) |
| Why It Matters | Physical systems obey symmetries — rotating a molecule shouldn't change its energy. Equivariant networks guarantee this by design |
| Key Advantage | Better data efficiency, improved generalisation, and physically consistent predictions compared to unconstrained architectures |
| Used For | Molecular dynamics, protein structure prediction, materials science, particle physics |
Key Equivariant Architectures:
| Architecture | Symmetry Group | Application |
|---|---|---|
| SE(3)-Transformers | SE(3) — rotation + translation | Protein structure, molecular dynamics |
| Tensor Field Networks | SO(3) — rotation | Atomic property prediction |
| e3nn | E(3) — rotation, translation, reflection | General-purpose equivariant networks |
| NequIP | E(3) | Interatomic potentials, materials |
| MACE | E(3) | Multi-body molecular interactions |
| Cormorant | SO(3) | Molecular property prediction |
Makes entire simulation pipelines differentiable — enabling gradient-based optimisation through physics simulations.
| Aspect | Detail |
|---|---|
| Core Mechanism | Implement physics simulators using differentiable programming frameworks so gradients can flow through the simulation |
| Key Advantage | Enables end-to-end optimisation of design parameters, control policies, and material properties through the simulator |
| How It Works | Forward pass runs the simulation; backward pass computes gradients of the output with respect to input parameters |
| Used For | Robot design optimisation, aerodynamic shape optimisation, material design, soft body simulation, fluid control |
Key Differentiable Simulation Frameworks:
| Framework | Description | Domain |
|---|---|---|
| JAX-MD | Molecular dynamics in JAX; fully differentiable | Molecular simulation, materials |
| DiffTaichi | Differentiable physical simulation framework | Fluid, soft body, rigid body |
| Warp (NVIDIA) | High-performance differentiable simulation | Robotics, physics, cloth |
| Brax (Google) | Differentiable rigid body physics in JAX | Robot learning, locomotion |
| PhiFlow | Differentiable fluid simulation | CFD, fluid dynamics research |
| TorchDiffEq | Differentiable ODE/PDE solvers in PyTorch | Neural ODEs, scientific modelling |
Adapted from general-purpose generative architectures to design and discover new molecules, materials, and structures.
| Model Type | Scientific Application | Examples |
|---|---|---|
| Diffusion Models | Molecule generation, protein design, crystal structure generation | DiffDock, RFdiffusion, CDVAE |
| Variational Autoencoders (VAEs) | Molecular generation, latent space exploration of chemical space | Junction Tree VAE, MolVAE |
| Flow-Based Models | Boltzmann distribution sampling, molecular conformer generation | Boltzmann Generators, E-NFs |
| Autoregressive Models | Sequential molecule generation, protein sequence design | ProtGPT2, ChemGPT, xTrimoPGLM |
| GANs | Molecular graph generation, crystal structure design | MolGAN, CDVAE |
| Reinforcement Learning | Goal-directed molecular design, optimising drug-like properties | REINVENT, MolDQN |
Large-scale pre-trained models adapted for scientific domains — analogous to LLMs but for molecules, proteins, and physical systems.
| Model | Domain | Description |
|---|---|---|
| AlphaFold 3 | Structural Biology | Predicts 3D structures of proteins, nucleic acids, and their complexes |
| ESM-2 / ESMFold | Protein Science | Meta's protein language model; predicts structure from sequence |
| Uni-Mol | Molecular Science | 3D molecular pre-training for property prediction and generation |
| MatterGen | Materials Science | Microsoft's generative model for novel stable materials |
| Open Catalyst Models | Catalysis | Meta's models for predicting catalyst-adsorbate interactions |
| GenCast | Weather | DeepMind's probabilistic weather forecasting model |
| GraphCast | Weather | DeepMind's deterministic 10-day global weather forecasting model |
| Pangu-Weather | Weather | Huawei's weather forecasting foundation model |
| Aurora | Earth System | Microsoft's foundation model for atmospheric science |
| ClimaX | Climate | Microsoft's climate and weather foundation model |
| GNoME | Materials | Google DeepMind's model discovering 2.2M new stable crystals |
| Nucleotide Transformer | Genomics | InstaDeep/NVIDIA's DNA/RNA language model |
| AlphaProof | Mathematics | DeepMind's formal mathematical reasoning system |
| AlphaGeometry 2 | Mathematics | DeepMind's geometry theorem prover |
| Tool | Provider | Focus |
|---|---|---|
| AlphaFold | Google DeepMind | Protein structure prediction; 200M+ structures in public database |
| RoseTTAFold | Baker Lab / UW | Open-source protein structure; 3-track architecture |
| OpenFold | Open-source | Trainable, open AlphaFold implementation for research |
| RDKit | Open-source | Cheminformatics; molecular descriptors, fingerprints, reactions |
| PyG (PyTorch Geometric) | PyG Team | GNN library; molecular graphs, materials, social networks |
| JAX / JAX-MD | Accelerated scientific computing; molecular dynamics simulations | |
| NVIDIA Modulus | NVIDIA | Physics-informed AI; PINNs, FNO; digital twin development |
| DeepChem | Open-source | ML for drug discovery; MoleculeNet benchmarks; featurisers |
| Open Catalyst Project | Meta | Catalyst discovery; OC20/OC22 datasets; GNN models |
| GraphCast | Google DeepMind | ML weather forecasting; 10-day forecast in 1 minute |
| Pangu-Weather | Huawei | Transformer-based global weather prediction |
| GROMACS + ML | Open-source | Molecular dynamics with ML force fields |
| Lean 4 | Microsoft | Interactive theorem prover; formal math verification |
| Siemens Xcelerator | Siemens | Industrial digital twin platform; Simcenter |
| Framework | Provider / Community | Deployment | Highlights |
|---|---|---|---|
| PyTorch Geometric (PyG) | PyG Team | Open-Source (any OS; Python 3.8+; PyTorch; NVIDIA GPU recommended; CUDA 11.8+) | GNN library for molecular and scientific graph data; widely adopted |
| DGL (Deep Graph Library) | Amazon / community | Open-Source (any OS; Python 3.8+; PyTorch/TensorFlow/MXNet; NVIDIA GPU recommended) | Scalable GNN framework; molecular, material, and biological applications |
| JAX | Open-Source (any OS; Python 3.9+; NVIDIA GPU or TPU; XLA-accelerated) | Functional, composable, accelerated NumPy; ideal for scientific computing and differentiable simulation | |
| e3nn | Community | Open-Source (any OS; Python 3.8+; PyTorch; CPU or NVIDIA GPU) | E(3)-equivariant neural network library; foundational for molecular and materials AI |
| DeepChem | Community (open-source) | Open-Source (any OS; Python 3.8+; CPU or NVIDIA GPU) | Python library for drug discovery; molecular featurisation, models, and datasets |
| RDKit | Community (open-source) | Open-Source (any OS; Python 3.8+ or C++; CPU-only) | Cheminformatics toolkit; molecular representation, fingerprints, and property calculation |
| Open Babel | Community (open-source) | Open-Source (any OS; C++; CPU-only) | Chemical file format conversion and molecular manipulation |
| SciML (Julia) | Julia community | Open-Source (any OS; Julia 1.9+; CPU or NVIDIA GPU) | Scientific Machine Learning ecosystem; PINNs, neural ODEs, neural operators |
| NVIDIA Modulus | NVIDIA | Open-Source (Linux; Python 3.10+; NVIDIA GPU — A100/H100 recommended; CUDA 12+) | Physics-informed deep learning framework; PINNs, neural operators, domain-specific models |
| DeepXDE | Community (open-source) | Open-Source (any OS; Python 3.8+; PyTorch/TensorFlow/JAX backend; CPU or NVIDIA GPU) | PINNs and neural operator library; supports PyTorch, TensorFlow, JAX backends |
| Platform | Provider | Deployment | Highlights |
|---|---|---|---|
| Schrödinger Suite | Schrödinger | On-Prem (Linux/Windows; x86; NVIDIA GPU for FEP+) / Cloud (AWS, GCP, Azure via Schrödinger Cloud) | Physics-based + ML molecular modelling; FEP+, Glide docking, AutoQSAR |
| OpenMM | Stanford / community | Open-Source (any OS; Python 3.9+; NVIDIA GPU or AMD GPU via OpenCL; CUDA 11+) | GPU-accelerated molecular dynamics; Python API; ML force fields |
| GROMACS | Community (open-source) | Open-Source (Linux/macOS; C; NVIDIA GPU recommended; runs on HPC clusters) | High-performance molecular dynamics; widely used in academia |
| Amber | UC San Francisco | On-Prem (Linux; Fortran/C; NVIDIA GPU for pmemd.cuda; HPC clusters) | Molecular dynamics; drug design; free energy calculations |
| ASE (Atomic Simulation Environment) | Community (open-source) | Open-Source (any OS; Python 3.8+; CPU-only; integrates with DFT codes) | Python library for atomistic simulations; integrates with ML potentials |
| AiZynthFinder | AstraZeneca (open-source) | Open-Source (any OS; Python 3.8+; CPU-only) | AI-powered retrosynthesis planning |
| TorchDrug | Community (open-source) | Open-Source (any OS; Python 3.8+; PyTorch; NVIDIA GPU recommended) | PyTorch-based drug discovery library; molecular generation, property prediction |
| Therapeutics Data Commons (TDC) | Harvard (open-source) | Open-Source (any OS; Python 3.8+; CPU-only) | Standardised datasets and benchmarks for drug discovery AI |
| Platform | Provider | Deployment | Highlights |
|---|---|---|---|
| NVIDIA Earth-2 | NVIDIA | Cloud (NVIDIA DGX Cloud on AWS / Azure / Oracle Cloud; NVIDIA GPU — H100) | Digital twin of Earth; weather simulation; FourCastNet + neural operators |
| Google DeepMind Weather | Cloud (GCP — TPU for training; Vertex AI for inference) | GraphCast, GenCast; state-of-the-art weather forecasting | |
| Huawei Pangu-Weather | Huawei | Cloud (Huawei Cloud; NVIDIA GPU for training) | 3D transformer weather forecasting; competitive with ECMWF |
| ECMWF AI Integration | ECMWF | On-Prem (ECMWF HPC — Atos supercomputer; NVIDIA GPU clusters) / Cloud (European Weather Cloud) | Integrating ML into operational numerical weather prediction |
| Microsoft ClimaX / Aurora | Microsoft | Cloud (Azure — NVIDIA GPU VMs for training and inference) | Foundation models for climate and atmospheric science |
| WeatherBench 2 | Open-Source (any OS; Python; data hosted on GCS) | Standardised benchmark for weather forecasting AI |
| Platform | Provider | Deployment | Highlights |
|---|---|---|---|
| Materials Project | Lawrence Berkeley National Lab | Open-Source (web-hosted; API access; data on GCP; Python client — mp-api) | Open database of computed material properties; 150K+ materials |
| AFLOW | Duke University | Open-Source (web-hosted; REST API; Linux HPC for workflows) | Automatic Framework for Materials Discovery; databases and workflows |
| JARVIS (NIST) | NIST | Open-Source (web-hosted; Python 3.8+; data download + local compute) | Joint Automated Repository for Various Integrated Simulations; DFT + ML data |
| Open Catalyst Project | Meta AI | Open-Source (Linux; Python 3.9+; PyTorch; NVIDIA GPU — A100 for training; datasets on S3) | Large-scale dataset and models for catalyst discovery |
| Matminer | Community (open-source) | Open-Source (any OS; Python 3.8+; CPU-only) | Python library for mining Materials Project and other databases |
| NOMAD | EU (open-source) | Open-Source (web-hosted; REST API; data hosted on MPCDF — Max Planck HPC) | Novel Materials Discovery repository; computational materials data |
| Platform | Provider | Deployment | Highlights |
|---|---|---|---|
| AlphaFold Database | DeepMind / EMBL-EBI | Cloud (GCP; freely accessible web API) | 200M+ predicted protein structures; freely accessible |
| UniProt | UniProt Consortium | Cloud (EMBL-EBI infrastructure; freely accessible) | Comprehensive protein sequence and function database |
| NCBI / GenBank | NIH | Cloud (NIH data centres; freely accessible) | Primary genomic sequence database |
| Terra (Broad Institute) | Broad Institute / Verily | Cloud (GCP — Google Cloud platform) | Cloud-based genomics analysis platform |
| DNAnexus | DNAnexus | Cloud (AWS / Azure) | Enterprise genomics data analysis platform |
| Galaxy | Community (open-source) | Open-Source / Cloud (self-host Linux server; usegalaxy.org on cloud infrastructure) | Web-based genomics and bioinformatics workflow platform |
AlphaFold predicted 200M+ protein structures, covering nearly every known protein. This breakthrough — recognised with the 2024 Nobel Prize in Chemistry — enables rapid drug target identification, enzyme engineering, and understanding of disease mechanisms at atomic resolution. Isomorphic Labs now applies this to drug design.
AI virtual screening reduces candidate molecule identification from years to weeks. Generative models design novel drug-like molecules, while ADMET prediction filters for drug-likeness early. Recursion Pharmaceuticals and Isomorphic Labs have multiple AI-discovered candidates in clinical trials. Cost per candidate reduced by 10–100×.
GraphCast matches the European Centre for Medium-Range Weather Forecasts (ECMWF) at a fraction of compute — producing a 10-day global forecast in under 1 minute vs. hours on a supercomputer. Pangu-Weather and FourCastNet show similar results. Enables rapid ensemble forecasting, improved hurricane tracking, and real-time severe weather alerts.
Google DeepMind's GNoME discovered 2.2 million new stable crystal structures — 800× the number previously known to science. These include candidates for next-generation batteries, solar cells, catalysts, and superconductors. AI-guided synthesis is now validating these predictions in the lab, with 736 structures independently confirmed.
Variant effect prediction models guide precision medicine by scoring the pathogenicity of genetic mutations. AI predicts splice-site disruptions, promoter activity, enhancer interactions, and gene expression from DNA sequence alone. Enables clinical diagnosis of rare diseases, pharmacogenomics, and CRISPR target selection.
Siemens Xcelerator and NVIDIA Omniverse power real-time virtual replicas of factories, power plants, and entire cities. Continuous sensor data feeds physics-informed AI models for predictive maintenance (30% downtime reduction), process optimisation, and "what-if" scenario analysis without disrupting physical operations.
| Use Case | Description | Key Examples |
|---|---|---|
| Target Identification | AI identifies druggable protein targets from genomic and proteomic data | BenevolentAI, Insilico Medicine, Recursion |
| Virtual Screening | Screen billions of compounds in silico against a target protein | Atomwise AtomNet, Schrödinger, Relay Therapeutics |
| Lead Optimisation | Optimise drug candidates for potency, selectivity, and ADMET properties | Exscientia, Insilico Medicine, Schrödinger FEP+ |
| De Novo Drug Design | Generate entirely new drug-like molecules with desired properties | Insilico Chemistry42, Generate Biomedicines |
| Antibody Design | Design therapeutic antibodies using AI-guided methods | Absci, BigHat Biosciences, Nabla Bio |
| Clinical Trial Optimisation | Predict trial outcomes, optimise trial design, and identify patient cohorts | Unlearn.AI, Medidata AI, Veeva Vault |
| Drug Repurposing | Identify existing drugs that could treat new diseases | BenevolentAI (COVID-19), Recursion |
| Protein Engineering | Design proteins with novel functions for therapeutics or industrial use | RFdiffusion (Baker Lab), ProteinMPNN, Generate Biomedicines |
| Use Case | Description | Key Examples |
|---|---|---|
| Battery Material Discovery | Find new battery cathode/anode materials with higher energy density | GNoME, Materials Project, Carnegie Mellon AI |
| Solar Cell Optimisation | Discover and optimise novel photovoltaic materials | Perovskite discovery via ML; Stanford Materials AI |
| Carbon Capture | Identify materials and molecules for CO₂ capture and sequestration | Open Catalyst Project (Meta), materials screening |
| Grid Optimisation | Forecast renewable generation; optimise grid dispatch | DeepMind (Google data centre cooling), NVIDIA Earth-2 |
| Hydrogen Catalyst Discovery | Find efficient catalysts for hydrogen production | Open Catalyst, catalysis GNNs |
| Nuclear Fusion Plasma Control | Control plasma in tokamak fusion reactors via RL | DeepMind + SPC (Swiss Plasma Center) |
| Weather Forecasting for Energy | Predict wind and solar output for grid planning | GraphCast, Pangu-Weather, FourCastNet |
| Use Case | Description | Key Examples |
|---|---|---|
| Aerodynamic Design | AI-accelerated CFD surrogate models for aircraft and rocket design | NVIDIA Modulus, Cadence, Ansys SimAI |
| Structural Analysis | Neural surrogate for finite element analysis of airframes and engines | Ansys, Siemens Simcenter + AI |
| Satellite Orbit Prediction | Predict satellite trajectories and collision risks | LeoLabs, AGI, ESA AI |
| Materials for Extreme Conditions | Discover alloys and composites for high-temperature aerospace applications | GNoME, Materials Project, US DoE national labs |
| Flight Simulation | Real-time physics-based flight simulation with AI enhancement | NVIDIA Omniverse, Lockheed Martin digital twins |
| Use Case | Description | Key Examples |
|---|---|---|
| Crash Simulation | Neural surrogates for crash test simulation at 1000× speed | BMW + NVIDIA, Siemens Simcenter |
| Generative Design | AI generates optimised mechanical parts meeting specified constraints | Autodesk Fusion 360 + AI, nTopology |
| Process Simulation | Digital twin of manufacturing processes; casting, moulding, machining | Siemens Xcelerator, Dassault 3DEXPERIENCE |
| Battery Simulation | Simulate battery electrochemistry and thermal behaviour | Ansys Fluent + AI, Siemens BDS |
| Predictive Quality | Simulate quality outcomes before production; reduce scrap | Sight Machine, AspenTech, Siemens MindSphere |
| Use Case | Description | Key Examples |
|---|---|---|
| Structural Health Monitoring | Digital twin monitors bridges, dams, and buildings for structural integrity | Bentley iTwin, WSP Digital, Arup |
| Construction Simulation | Simulate construction schedules pand logistics digitally before building | Autodesk Construction Cloud, Bentley SYNCHRO |
| Energy Performance Simulation | Simulate building energy consumption and optimise HVAC design | EnergyPlus + ML, IES VE, Autodesk Insight |
| Flood / Disaster Simulation | Model urban flooding, earthquake damage, and evacuation scenarios | NVIDIA Earth-2, Deltares, MIKE AI |
| Material Specification | AI recommends optimal concrete, steel, or composite specifications | Materials Informatics platforms |
| Use Case | Description | Key Examples |
|---|---|---|
| Crop Yield Prediction | Forecast yields using satellite imagery, weather data, and soil models | IBM Watson Agriculture (Climate Corp), Planet Labs |
| Precision Agriculture | Site-specific fertiliser, irrigation, and pesticide recommendations | John Deere AI, Blue River Technology |
| Genome-Assisted Breeding | Predict crop trait performance from genomic markers | CIMMYT, Bayer Crop Science AI |
| Climate Impact Modelling | Simulate crop performance under future climate scenarios | ClimaX, IIASA, FAO modelling tools |
| Use Case | Description | Key Examples |
|---|---|---|
| Portfolio Risk Simulation | Monte Carlo and AI-accelerated portfolio stress testing | BlackRock Aladdin, Bloomberg, QuantConnect |
| Climate Risk Modelling | Assess physical and transition climate risk for financial portfolios | Moody's ESG, MSCI Climate, S&P Trucost |
| Fraud Simulation | Simulate synthetic fraud patterns for model training | PayPal, Visa, Mastercard AI labs |
| Derivatives Pricing | Neural surrogate models for real-time derivatives valuation | JPMorgan Athena AI, Goldman Sachs Marquee |
| Benchmark | What It Evaluates | Key Metric |
|---|---|---|
| CASP (Critical Assessment of Structure Prediction) | Protein structure prediction accuracy | GDT-TS (Global Distance Test — Total Score); TM-score |
| CAMEO | Continuous automated model evaluation for protein structure | GDT-TS, lDDT (local Distance Difference Test) |
| PDB (Protein Data Bank) | Reference experimental structures for validation | Structures validated against X-ray, cryo-EM, NMR |
| Benchmark | What It Evaluates | Key Metric |
|---|---|---|
| MoleculeNet | Molecular property prediction across 17 datasets | ROC-AUC, RMSE depending on task |
| Open Graph Benchmark (OGB-Mol) | Large-scale molecular graph tasks | ROC-AUC (ogbg-molhiv, ogbg-molpcba) |
| TDC (Therapeutics Data Commons) | End-to-end drug discovery tasks: ADMET, docking, generation | Task-specific metrics; leaderboards |
| DOCKSTRING | Molecular docking and drug-likeness | Docking score + drug-likeness trade-off |
| GuacaMol | Molecular generation quality | Validity, uniqueness, novelty, KL divergence |
| MOSES | Molecular generation benchmark | FCD (Fréchet ChemNet Distance), SNN, Scaf |
| Benchmark | What It Evaluates | Key Metric |
|---|---|---|
| MatBench | Material property prediction (13 tasks) | MAE on formation energy, band gap, etc. |
| Open Catalyst 2020/2022 (OC20/OC22) | Catalyst-adsorbate interaction prediction | Energy MAE, force MAE, position RMSE |
| Materials Project Validation | Predicted vs. experimental material properties | Formation energy error (meV/atom) |
| Benchmark | What It Evaluates | Key Metric |
|---|---|---|
| WeatherBench 2 | Global weather forecasting | RMSE on geopotential height (Z500), temperature (T850), precipitation |
| ECMWF Scorecard | Comparison against operational NWP | Anomaly correlation coefficient (ACC) |
| ClimateBench | Climate projection accuracy | RMSE on temperature and precipitation under forcing scenarios |
| Benchmark | What It Evaluates | Key Metric |
|---|---|---|
| Minif2f | Formal theorem proving (miniature formalisations) | % of problems proved |
| ProofNet | Undergraduate-level formal theorem proving | Proof success rate |
| IMO Problems | International Mathematical Olympiad competition problems | Number of problems solved; medal-equivalent score |
| MATH Benchmark | Competition-level maths (Hendrycks et al.) | Accuracy across algebra, geometry, number theory, etc. |
| GSM8K | Grade school maths reasoning | Accuracy on multi-step arithmetic word problems |
| Metric | What It Measures | Ideal Target |
|---|---|---|
| Prediction Accuracy | Agreement between AI prediction and experimental ground truth | Within experimental uncertainty |
| Speed-Up Factor | Time for AI prediction vs. traditional simulation or experiment | 100–1,000,000× depending on domain |
| Data Efficiency | Accuracy achieved per number of training examples | Maximise accuracy with minimal data |
| Uncertainty Calibration | Whether predicted confidence intervals match observed error rates | Well-calibrated; neither over- nor under-confident |
| Transferability | Performance on unseen molecules / materials / conditions | Generalise beyond training distribution |
| Physical Consistency | Whether predictions obey known physical laws (conservation, symmetry) | Zero violations of known physical constraints |
| Synthesisability (Molecules) | Whether generated molecules can actually be synthesised | SA Score; retrosynthesis feasibility |
| Experimental Validation Rate | % of AI predictions confirmed by wet-lab or physical experiment | >70% for actionable scientific candidates |
| Metric | Value | Source / Notes |
|---|---|---|
| Global AI in Drug Discovery Market (2024) | ~$3.2 billion | Grand View Research; includes target ID, virtual screening, ADMET, generative chemistry |
| Projected Drug Discovery AI Market (2030) | ~$14.1 billion | CAGR ~28%; driven by clinical pipeline advancement and pharma AI adoption |
| Global Digital Twin Market (2024) | ~$17.5 billion | Includes manufacturing, energy, smart cities, healthcare |
| Projected Digital Twin Market (2030) | ~$110 billion | CAGR ~36.5%; driven by IoT, 5G, edge AI, and industrial metaverse |
| AI in Materials Science Market (2024) | ~$0.9 billion | Emerging market; growing rapidly with GNoME and similar breakthroughs |
| AI Weather Forecasting Market (2024) | ~$0.4 billion | Nascent; growing as GraphCast/GenCast approach operational deployment |
| % of Top-20 Pharma Companies Using AI for Discovery (2024) | 100% | McKinsey; all major pharma now have AI drug discovery programmes |
| Number of AI-Discovered Drugs in Clinical Trials (2024) | ~70+ | Insilico Medicine, Exscientia, Recursion among leaders |
| Domain | Adoption Level | Key Drivers |
|---|---|---|
| Pharmaceuticals | High | Cost pressure ($1–2B per drug); pipeline attrition; competitive AI race |
| Materials Science | Medium–High | GNoME breakthrough; battery and semiconductor urgency; national lab investment |
| Weather & Climate | Medium | GraphCast/GenCast quality; operational integration challenges with NWP agencies |
| Genomics & Biology | High | AlphaFold impact; single-cell revolution; CRISPR; personalised medicine |
| Energy & Sustainability | Medium | Catalyst discovery; grid optimisation; regulatory ESG pressure |
| Aerospace & Automotive | Medium | Digital twin adoption; simulation speed demands; generative design |
| Mathematics | Low–Medium | AlphaProof nascent; formal verification community growing; niche but high impact |
| Astrophysics / HEP | Medium | Petabyte data volumes; CERN and telescope survey needs; well-funded |
| Driver | Description |
|---|---|
| AlphaFold Moment | AlphaFold's breakthrough catalysed adoption across all scientific AI; proved transformative impact is possible |
| Pharma R&D Cost Crisis | $1–2 billion and 10–15 years to develop a drug; AI promises 50–70% reduction in early-stage timelines |
| Climate Urgency | Demand for new materials (batteries, solar, carbon capture) and better climate models is existentially motivated |
| Compute Availability | Cloud GPU/TPU access democratises training of scientific AI models beyond elite institutional labs |
| Open-Source Models | AlphaFold DB, Open Catalyst, ESM-2 are freely available; lowering barriers to adoption dramatically |
| Foundational Model Transfer | Pre-trained scientific foundation models can be fine-tuned for specific tasks with limited data |
| National Lab Investment | US DoE, CERN, NIH, Wellcome Trust, and national agencies investing heavily in AI for science |
| Industrial Digital Twin Growth | Manufacturing, energy, and infrastructure sectors investing billions in real-time simulation |
| Use Case | Typical Impact | Source |
|---|---|---|
| Drug Discovery (Lead Identification) | 50–70% reduction in time to identify lead candidates | Insilico Medicine, Exscientia case studies |
| Protein Structure Prediction | From months (X-ray crystallography) to seconds (AlphaFold) | DeepMind; >200M structures in AlphaFold DB |
| Materials Discovery | 2.2M new stable crystals discovered by GNoME (more than all prior human discoveries) | Google DeepMind (2023) |
| Weather Forecasting | 10-day forecast in <1 minute vs. hours for ECMWF HRES; comparable or better accuracy | DeepMind GraphCast paper |
| Engineering Simulation | 1,000–10,000× speed-up with neural surrogates | NVIDIA Modulus case studies |
| Digital Twin (Manufacturing) | 20–30% reduction in unplanned downtime | Siemens, GE, NVIDIA digital twin deployments |
| Clinical Trial Optimisation | 10–30% reduction in trial duration through AI-optimised design | Unlearn.AI, Medidata case studies |
| Segment | Leaders | Challengers |
|---|---|---|
| Protein Structure & Design | DeepMind (AlphaFold), Baker Lab (RFdiffusion), Meta (ESMFold) | OpenFold, ColabFold, Generate Biomedicines |
| Drug Discovery AI | Insilico Medicine, Recursion, Exscientia, Schrödinger | Atomwise, BenevolentAI, Relay Therapeutics, Isomorphic Labs |
| Materials Science AI | Google DeepMind (GNoME), Microsoft (MatterGen), Meta (Open Catalyst) | Materials Project, NIST JARVIS, M3GNet |
| Weather & Climate AI | DeepMind (GraphCast/GenCast), NVIDIA (FourCastNet/Earth-2), Huawei (Pangu) | Microsoft (ClimaX/Aurora), ECMWF AI integration |
| Digital Twin Platforms | NVIDIA Omniverse, Siemens Xcelerator, Azure Digital Twins | AWS IoT TwinMaker, GE Digital, Dassault, Bentley, Ansys |
| Scientific ML Frameworks | PyTorch Geometric, JAX, NVIDIA Modulus | e3nn, DeepChem, SciML (Julia), DeepXDE |
| Genomics AI | DeepMind (DeepVariant), Illumina DRAGEN, InstaDeep/NVIDIA | DNAnexus, Broad Institute (Terra), 10x Genomics |
| Mathematics AI | DeepMind (AlphaProof, AlphaGeometry) | Meta (Lean / HTPS), DeepSeek-Prover, Microsoft |
AI models generate plausible but physically impossible results — molecules that violate valency rules, protein structures with steric clashes, or materials with forbidden crystal symmetries. Without domain expertise in the loop, errors propagate into downstream research.
Complex multi-stage pipelines — data preprocessing, model training, hyperparameter tuning, post-processing — are notoriously hard to reproduce. Gaps in data versioning, random seed management, and environment specification undermine scientific rigour.
Models trained on known chemistry, physics, or biology often fail silently on novel regimes — new chemical scaffolds, extreme temperatures, or rare genomic variants. Extrapolation beyond training data is the Achilles' heel of data-driven science.
The same tools that accelerate drug discovery can be repurposed to design toxins, bioweapons, or novel pathogens. Molecular generation models require careful governance, access controls, and ethical review frameworks.
Training large scientific foundation models (ESM-2, Evo, GraphCast) requires massive GPU/TPU clusters. A single AlphaFold training run costs millions of dollars in compute. This creates access inequality between well-funded labs and the broader scientific community.
Scientists may skip experimental validation, treating AI predictions as ground truth. AlphaFold confidence scores (pLDDT) are sometimes ignored, leading to reliance on low-confidence predictions. Human expertise and wet-lab confirmation remain essential.
| Limitation | Description |
|---|---|
| Distribution Shift | Models trained on known molecules / materials / conditions may fail when predicting outside their training distribution |
| Data Scarcity | Experimental data is expensive and scarce in many scientific domains; models must learn from limited examples |
| Uncertainty Underestimation | Models may be confidently wrong — predicting with high certainty in regions where they have no training data |
| Physical Consistency | Data-driven models can violate conservation laws, symmetries, or thermodynamic constraints if not properly constrained |
| Simulator Fidelity | Neural surrogates are only as good as the simulations they were trained on; garbage-in simulation yields garbage-out surrogates |
| Compute Requirements | Training large scientific AI models (AlphaFold, GraphCast, foundation models) requires massive GPU/TPU resources |
| Reproducibility | Complex training pipelines with many hyperparameters can be difficult to reproduce exactly |
| Long-Range Interactions | GNNs with limited message-passing depth may miss long-range molecular or spatial interactions |
| Multi-Scale Modelling | Bridging atomic-scale phenomena to macroscopic behaviour (e.g., molecular to material property) remains extremely challenging |
| Real-Time Constraint | Some applications (digital twins, control) require millisecond-level inference — challenging for complex models |
| Risk | Description | Mitigation |
|---|---|---|
| False Discoveries | AI may predict a stable material or active drug that fails in experiment | Experimental validation loops; active learning |
| Overfitting to Benchmarks | Models optimised for benchmark datasets may not generalise to real-world scientific problems | Evaluate on diverse, out-of-distribution datasets |
| Lack of Interpretability | Black-box models produce predictions without scientific explanation | Use equivariant / physics-informed architectures; SHAP; attention analysis |
| Publication Bias | Only successful AI predictions are published; failure cases are hidden | Open science; negative results reporting |
| Hallucinated Science | LLM-based scientific assistants may generate plausible but incorrect scientific claims | Ground in databases; citation verification; domain expert review |
| Benchmark Saturation | Popular benchmarks become easy; performance no longer predicts real-world utility | Develop new, harder, more realistic benchmarks |
| Risk | Description | Mitigation |
|---|---|---|
| Dual-Use in Chemistry | Molecular generation models could be prompted to design toxic or hazardous compounds | Output filtering; restricted access; ethical review boards |
| Biological Weapons Risk | Protein design and pathogen modelling tools could theoretically be misused | Biosecurity review; access controls; international norms |
| Environmental Modelling Misuse | Climate models could be manipulated to support misleading policy narratives | Open science; transparent methodology; peer review |
| IP & Patent Conflicts | AI-generated molecules may infringe existing patents or create ownership disputes | Freedom-to-operate analysis; IP landscape mapping |
| Automation of Dangerous Experiments | AI-directed lab automation could execute hazardous experiments without adequate safety review | Human-in-the-loop for novel experiment approval; safety constraints |
| Consideration | Description |
|---|---|
| Access Equity | Advanced scientific AI tools are concentrated in well-funded labs; resource-limited institutions are left behind |
| Scientific Job Displacement | AI automation of computational chemistry, simulation, and analysis may reduce demand for certain scientific roles |
| Authorship & Credit | Who receives credit for an AI-assisted scientific discovery — the algorithm, the developers, or the domain scientist? |
| Open Science vs. Commercial IP | Tension between open-source scientific AI (AlphaFold DB) and proprietary commercial models (Isomorphic Labs) |
| Bias in Scientific Data | Historical scientific datasets may underrepresent certain conditions, populations, or chemical spaces |
| Compute Carbon Footprint | Training large scientific AI models consumes significant energy; environmental cost must be weighed against scientific benefit |
Explore how this system type connects to others in the AI landscape:
Bayesian / Probabilistic AI Physical / Embodied AI Optimisation / OR AI Generative AI Evolutionary / Genetic AI| Term | Definition |
|---|---|
| Ab Initio | "From first principles" — computational methods that solve fundamental equations without empirical parameters |
| Active Learning | A training strategy where the model selects the most informative data points for labelling, minimising experiments needed |
| ADMET | Absorption, Distribution, Metabolism, Excretion, Toxicity — key pharmacokinetic properties for drug candidates |
| AlphaFold | DeepMind's AI system for predicting protein 3D structures from amino acid sequences; solved the protein folding problem |
| Binding Affinity | The strength with which a drug molecule binds to its protein target; a critical metric in drug discovery |
| Boltzmann Distribution | The probability distribution of molecular states at thermal equilibrium; target distribution for molecular sampling |
| CFD (Computational Fluid Dynamics) | Numerical simulation of fluid flow governed by the Navier-Stokes equations |
| Collocation Points | Points sampled in the domain where PDE residuals are evaluated in PINN training |
| Conformer | A specific 3D arrangement of atoms in a molecule achievable by rotation around single bonds |
| Conservation Law | A physical principle stating that a quantity (energy, momentum, mass) remains constant in an isolated system |
| Crystal Structure | The ordered, repeating arrangement of atoms in a crystalline solid material |
| De Novo Design | Designing entirely new molecules or materials from scratch, rather than modifying existing ones |
| DFT (Density Functional Theory) | A quantum mechanical method for calculating the electronic structure and properties of molecules and materials |
| Differentiable Programming | Programming where all operations are differentiable, enabling gradient-based optimisation through entire programs |
| Differentiable Simulation | Physics simulation implemented in a differentiable framework, enabling end-to-end gradient-based optimisation |
| Digital Twin | A virtual replica of a physical system continuously updated with real-time sensor data |
| Docking (Molecular) | Predicting the preferred orientation and binding pose of a drug molecule in a protein's binding pocket |
| E(3)-Equivariance | Invariance to translation and equivariance to rotation and reflection in 3D Euclidean space |
| Equivariance | A property where transforming the input (e.g., rotating a molecule) produces a correspondingly transformed output |
| Evoformer | AlphaFold's core attention-based architecture processing MSA and pair representations simultaneously |
| FEA (Finite Element Analysis) | A numerical method for solving structural mechanics, heat transfer, and other PDE-governed problems |
| FEP (Free Energy Perturbation) | A physics-based method for calculating binding free energy differences between molecules |
| Flow Matching | A generative modelling technique for learning continuous transformations between distributions |
| FNO (Fourier Neural Operator) | A neural operator that learns in Fourier space; efficient for PDE solving on regular domains |
| Force Field | A mathematical model describing the potential energy of a system of atoms as a function of their positions |
| Formation Energy | The energy change when a compound is formed from its constituent elements; key predictor of material stability |
| GDT-TS (Global Distance Test — Total Score) | A standard metric for measuring protein structure prediction accuracy; measures the fraction of residues within distance thresholds |
| GNN (Graph Neural Network) | A neural network that operates on graph-structured data via message passing between connected nodes |
| GNoME | Google DeepMind's Graph Networks for Materials Exploration; discovered 2.2M new stable crystal structures |
| GraphCast | DeepMind's GNN-based weather forecasting model; 10-day global forecast in <1 minute |
| Hamiltonian | A function representing the total energy of a physical system; governs time evolution via Hamilton's equations |
| Inductive Bias | Assumptions built into a model's architecture to guide learning — e.g., translation invariance in CNNs, equivariance in scientific GNNs |
| Interatomic Potential | A function that calculates the potential energy of a system from atomic positions; used in molecular dynamics |
| Invariance | A property where the output remains unchanged under a transformation of the input (e.g., total energy unchanged by rotation) |
| Lagrangian | A function encoding the dynamics of a system as the difference between kinetic and potential energy |
| lDDT (Local Distance Difference Test) | A metric for evaluating the local accuracy of predicted protein structures |
| MDgeneral (Molecular Dynamics) | Simulating the physical movement of atoms over time by solving Newton's equations of motion |
| Message Passing | The core operation in GNNs where each node updates its representation based on information received from its neighbours |
| ML Potential / MLIP | A machine learning interatomic potential; replaces expensive quantum chemistry with fast, learned energy and force predictions |
| MSA (Multiple Sequence Alignment) | Alignment of homologous protein or DNA sequences; provides evolutionary information used by AlphaFold |
| Neural ODE | A neural network that parameterises the right-hand side of an ordinary differential equation; enables continuous-depth models |
| Neural Operator | A neural network that learns mappings between function spaces — solving families of PDEs, not individual instances |
| Neural Surrogate | An AI model trained to approximate the input-output behaviour of an expensive simulator |
| NWP (Numerical Weather Prediction) | Traditional physics-based weather forecasting by numerically solving atmospheric fluid dynamics equations |
| PDE (Partial Differential Equation) | An equation involving partial derivatives of a function; governs most physical phenomena (fluid flow, heat transfer, electromagnetics) |
| PINN (Physics-Informed Neural Network) | A neural network trained with PDE residuals as loss terms, embedding physical laws into learning |
| Protein Folding | The physical process by which a linear protein chain folds into a specific 3D structure |
| Retrosynthesis | Planning the chemical reaction steps needed to synthesise a target molecule from available precursors |
| RLHF for Science | Using reinforcement learning from human/expert feedback to align scientific AI outputs with domain knowledge |
| RMSE (Root Mean Square Error) | A standard metric measuring the average magnitude of prediction errors |
| Rotational Equivariance | The property that rotating the input produces a correspondingly rotated output — essential for 3D molecular models |
| SE(3) | The Special Euclidean group in 3D — the group of rotations and translations; the symmetry group of 3D rigid-body motion |
| SELFIES | Self-Referencing Embedded Strings — a molecular string representation guaranteeing syntactic validity for generative models |
| SMILES | Simplified Molecular Input Line Entry System — a text-based notation for molecular structure |
| Surrogate Model | A computationally cheap approximation of an expensive simulation or function; used for fast evaluation and optimisation |
| TM-score | Template Modelling score — measures the structural similarity between two protein structures; topology-sensitive |
| Uncertainty Quantification (UQ) | Methods for estimating and communicating the confidence and reliability of model predictions |
| Virtual Screening | Computationally evaluating a large library of compounds for activity against a drug target, before wet-lab testing |
Animation infographics for Scientific / Simulation AI — overview and full technology stack.
Animation overview · Scientific / Simulation AI · 2026
Animation tech stack · Hardware → Compute → Data → Frameworks → Orchestration → Serving → Application · 2026
Detailed reference content for regulation.
| Regulation / Body | Jurisdiction | Key Implications for Scientific AI |
|---|---|---|
| FDA (Food & Drug Administration) | United States | AI-generated drug candidates must pass standard clinical trial phases; FDA guidance on AI/ML in drug development emerging |
| EMA (European Medicines Agency) | EU / EEA | AI drug discovery subject to same regulatory pathway; transparency in AI-assisted submission data required |
| ICH Guidelines | International | International harmonisation of pharmaceutical development; AI methods must be documented in regulatory submissions |
| Biosecurity Regulations | Global | Dual-use concerns for molecular generation; subject to biosecurity review and export controls |
| Clinical Trial Regulations | Global | AI-optimised trial designs must comply with GCP (Good Clinical Practice) and informed consent requirements |
| Regulation / Framework | Key Implications |
|---|---|
| Paris Agreement / UNFCCC | Climate models (including AI-based) inform national commitments; model transparency and validation standards matter |
| EU Climate Law | Mandates science-based climate targets; AI climate models must be scientifically rigorous and peer-reviewed |
| IPCC Assessment Process | AI climate models are increasingly cited; must meet IPCC standards for evidence quality and uncertainty communication |
| ESG Disclosure (CSRD, SEC Climate) | Companies using AI climate risk models for ESG disclosure must ensure model validity and auditability |
| Regulation | Key Implications |
|---|---|
| REACH (EU) | AI-predicted material or chemical properties must be validated against regulatory safety testing requirements |
| TSCA (US EPA) | New AI-designed chemicals may require EPA review before manufacturing or import |
| GHS (Globally Harmonised System) | AI-predicted hazard classifications must align with GHS standards |
| Nuclear Regulation | AI models used in nuclear energy simulation subject to nuclear safety authority validation (e.g., NRC, IAEA) |
| Practice | Description |
|---|---|
| Experimental Validation | Never deploy AI predictions as scientific fact without experimental or independent computational validation |
| Uncertainty Reporting | Always report uncertainty estimates alongside predictions; communicate confidence levels clearly |
| Open Science & Reproducibility | Publish models, training data, and evaluation details openly to enable independent verification |
| Dual-Use Review | Submit molecular and biological AI tools for biosecurity review before public release |
| Domain Expert Oversight | Ensure scientific AI outputs are reviewed by domain experts before critical decisions |
| Model Documentation | Maintain detailed model cards documenting training data, architecture, limitations, and intended use |
| Benchmark Transparency | Report performance on standardised benchmarks; disclose failure modes and out-of-distribution behaviour |
| Data Provenance | Document the origin, quality, and preprocessing of all scientific training data |
| Ethical Review | Subject high-impact scientific AI applications to institutional ethics review (IRB or equivalent) |
| Carbon Reporting | Track and disclose the computational carbon footprint of training and running scientific AI models |
Detailed reference content for deep dives.
Traditional ML learns entirely from data. Physics-Informed ML incorporates domain knowledge — physical laws, conservation principles, symmetries — as inductive biases, constraints, or architectural priors.
| Paradigm | Data Requirement | Physics Involvement | Example |
|---|---|---|---|
| Pure Data-Driven | High | None — learns patterns only from data | Standard deep learning on scientific datasets |
| Physics-Constrained | Medium | Physics as loss terms or constraints | PINNs; physics loss in training |
| Physics-Encoded | Low–Medium | Physics built into architecture | Equivariant networks; Hamiltonian Neural Networks |
| Physics-Simulated | None (synthetic) | Data generated by physics simulator | Neural surrogates trained on simulation data |
| Hybrid | Medium | Combines data-driven + physics simulation | Corrector models that fix simulator errors |
| Method | How It Works | Best For |
|---|---|---|
| PINNs | PDE residual as loss function; no mesh required | Inverse problems, sparse data, PDE solving |
| Hamiltonian Neural Networks | Learn the Hamiltonian of a system; conserve energy by construction | Conservative dynamical systems |
| Lagrangian Neural Networks | Learn the Lagrangian; derive equations of motion via Euler-Lagrange | Mechanical systems with constraints |
| Neural ODEs | Parameterise the right-hand side of an ODE with a neural network | Continuous-time dynamical systems |
| Conservation Law Networks | Hard-code conservation laws (mass, momentum, energy) into network | Fluid dynamics, thermodynamics |
| Symmetry-Preserving Networks | Architecture respects known symmetries (rotation, translation, gauge) | Molecular, particle physics, materials |
| Dimension | Traditional Numerical Simulation | AI-Accelerated Simulation |
|---|---|---|
| Speed | Hours to days for complex 3D simulations | Seconds to minutes for neural surrogates |
| Accuracy | High — controlled numerical error | Near-numerical accuracy for well-trained surrogates; uncertainty quantification needed |
| Mesh Requirement | Yes — discretisation of domain required | No — many approaches are mesh-free |
| Flexibility | General-purpose within physics; change equations easily | Must retrain for different physics |
| Data Requirement | No training data needed — only governing equations | Requires training data (from simulations or experiments) |
| Parametric Sweeps | Expensive — re-run full simulation for each parameter | Cheap — single forward pass per configuration |
| Inverse Problems | Difficult — requires adjoint methods or sampling | Natural — gradients flow through differentiable models |
AI models trained to approximate expensive simulations — replacing minutes-to-hours computation with millisecond inference.
| Application | What It Replaces | Speed-Up |
|---|---|---|
| Aerodynamic Shape Optimisation | CFD simulations (RANS, LES) | 1,000–10,000× |
| Structural Analysis | Finite Element Analysis (FEA) | 100–1,000× |
| Crash Simulation | Explicit dynamics (LS-DYNA) | 1,000× |
| Thermal Management | Conjugate heat transfer simulation | 500–5,000× |
| Electromagnetic Simulation | FDTD / FEM Maxwell solvers | 100–1,000× |
| Weather Prediction | Numerical Weather Prediction (NWP) | 10,000× (GraphCast vs. IFS) |
| Molecular Dynamics | Ab initio / DFT calculations | 1,000–1,000,000× |
┌──────────────────────────────────────────────────────────────────────┐
│ AI-ACCELERATED DRUG DISCOVERY PIPELINE │
│ │
│ 1. TARGET ID 2. VIRTUAL 3. LEAD │
│ ───────────── SCREENING OPTIMISATION │
│ AI identifies ────────────── ────────────── │
│ druggable Screen millions Optimise for binding │
│ protein targets of compounds in affinity, selectivity, │
│ from genomic & silico; molecular ADMET, and │
│ proteomic data docking; scoring synthesisability │
│ │
│ 4. ADMET 5. RETROSYNTHESIS 6. CLINICAL │
│ PREDICTION ───────────────── CANDIDATE │
│ ────────────── Plan synthesis ────────────── │
│ Predict drug- routes for top AI-predicted │
│ likeness, candidates; candidates enter │
│ toxicity, robot-assisted preclinical and │
│ metabolism, chemistry clinical trials │
│ bioavailability │
│ │
│ ──────── FEEDBACK: EXPERIMENTAL DATA → MODEL REFINEMENT ───── │
└──────────────────────────────────────────────────────────────────────┘
| Task | What AI Does | Key Methods |
|---|---|---|
| Molecular Property Prediction | Predict physical, chemical, and biological properties from molecular structure | GNNs (SchNet, DimeNet), molecular fingerprints, transformers |
| Molecular Docking | Predict how a small molecule binds to a protein target | DiffDock, Vina, Glide, AutoDock + ML scoring |
| De Novo Molecule Generation | Generate entirely new molecules with desired properties | Diffusion models, VAEs, autoregressive generators, RL |
| Molecular Conformer Generation | Predict the 3D shape(s) a molecule adopts | GeoMol, torsional diffusion, RDKit + ML |
| ADMET Prediction | Predict Absorption, Distribution, Metabolism, Excretion, Toxicity | ADMET-AI, ADMETlab, Chemprop, GNNs |
| Retrosynthesis Planning | Plan the chemical synthesis route for a target molecule | AiZynthFinder, ASKCOS, Molecule Chef |
| Protein-Ligand Interaction | Predict binding affinity between a drug and its protein target | RF-Score, OnionNet, DeepDTA, equivariant models |
| Reaction Prediction | Predict products of a chemical reaction | Molecular Transformer, RXNMapper |
| Representation | Description | Best For |
|---|---|---|
| SMILES | String-based linear notation for molecules | Sequence models, database storage |
| SELFIES | Self-referencing embedded strings; guaranteed syntactic validity | Generative models (guaranteed valid molecules) |
| Molecular Graphs | Atoms as nodes, bonds as edges | GNNs; property prediction |
| 3D Coordinates | Atom positions in 3D space | Docking, conformer generation, equivariant models |
| Fingerprints (ECFP, MACCS) | Fixed-length binary or count vectors encoding substructure presence | Similarity search, classical ML models |
| Coulomb Matrix | Encodes pairwise atomic distances and charges | Quantum chemistry property prediction |
| Company / Platform | Focus | Stage / Highlights |
|---|---|---|
| Insilico Medicine | End-to-end AI drug discovery | First AI-designed drug to Phase II (idiopathic pulmonary fibrosis) |
| Recursion Pharmaceuticals | AI-driven cellular imaging for drug discovery | Massive biological dataset; phenotypic screening |
| Exscientia | AI-driven drug design with human-AI collaboration | First AI-designed molecule to enter clinical trials (2020) |
| Atomwise | AI virtual screening using deep learning | AtomNet; 750+ projects with pharma and biotech partners |
| Schrödinger | Physics-based + ML molecular simulation | FEP+ and ML-based drug design platform |
| BenevolentAI | AI-first drug discovery with knowledge graph | Baricitinib repurposed for COVID-19 via AI |
| Relay Therapeutics | Motion-based drug design using MD simulation + AI | Targets dynamic protein conformations |
| Isomorphic Labs | DeepMind's drug discovery spinoff | Leveraging AlphaFold for drug design |
| Absci | AI-designed antibody therapeutics | Generative models for de novo antibody design |
| Generate Biomedicines | Generative AI for protein therapeutics | Chroma: generative model for protein design |
A digital twin is a virtual replica of a physical object, process, or system that is continuously updated with real-time data from sensors — enabling monitoring, simulation, prediction, and optimisation.
| Dimension | Detail |
|---|---|
| Core Concept | A living digital model that mirrors a physical asset's state and behaviour in real time |
| Data Flow | Physical sensors → data pipeline → digital twin model → insight / action → physical asset |
| Key Capability | What-if simulation: "If I change this parameter, what happens?" — answered in real time |
| AI Role | Neural surrogates accelerate simulation; ML predicts anomalies; optimisation engines find best operating points |
┌──────────────────────────────────────────────────────────────────────┐
│ DIGITAL TWIN ARCHITECTURE │
│ │
│ PHYSICAL ASSET SENSOR LAYER DATA PIPELINE │
│ ───────────── ────────────── ────────────── │
│ Factory, engine, IoT sensors, Streaming ingest, │
│ wind turbine, SCADA, cameras, edge processing, │
│ building, city LiDAR, ERP data data lake / warehouse │
│ │
│ DIGITAL MODEL AI / ML LAYER ACTION LAYER │
│ ────────────── ────────────── ────────────── │
│ Physics sim + Neural surrogates, Alerts, dashboards, │
│ neural surrogate; anomaly detection, automated control, │
│ real-time state predictive models, optimisation │
│ estimation optimisation │
└──────────────────────────────────────────────────────────────────────┘
| Platform | Provider | Highlights |
|---|---|---|
| NVIDIA Omniverse | NVIDIA | Universal platform for 3D simulation; OpenUSD; physics-accurate rendering; industrial digital twins |
| Siemens Xcelerator | Siemens | End-to-end digital twin platform; manufacturing, energy, infrastructure |
| Azure Digital Twins | Microsoft | Cloud-based digital twin platform; IoT Hub integration; spatial intelligence |
| AWS IoT TwinMaker | AWS | Build digital twins from IoT sensors; integrate 3D models and analytics |
| GE Digital Twin (Predix) | GE Vernova | Industrial digital twins for energy, aviation, and manufacturing |
| Ansys Twin Builder | Ansys | Simulation-based digital twins with reduced-order models |
| Dassault 3DEXPERIENCE | Dassault Systèmes | Virtual twin for product lifecycle; aerospace, automotive, healthcare |
| Bentley iTwin | Bentley Systems | Infrastructure digital twins; bridges, roads, utilities, buildings |
| PTC ThingWorx | PTC | IoT-powered digital twins; augmented reality overlay; manufacturing |
| Domain | Use Case | Impact |
|---|---|---|
| Manufacturing | Factory-floor digital twin; monitor equipment, predict maintenance | 20–30% reduction in unplanned downtime |
| Energy | Wind turbine digital twin; optimise blade pitch, predict failures | 5–10% increase in energy yield |
| Automotive | Crash simulation digital twin; virtual crash testing at 1000× speed | 70–90% reduction in physical crash tests |
| Smart Cities | City-scale digital twin; traffic optimisation, urban planning | Real-time traffic management; disaster response simulation |
| Healthcare | Patient digital twin; personalised treatment simulation | Simulate drug responses before administration |
| Aerospace | Aircraft engine digital twin; monitor fatigue, plan maintenance | Predictive maintenance; extended engine life |
| Supply Chain | Warehouse digital twin; optimise layout, staffing, and flow | 15–25% improvement in throughput |
Detailed reference content for overview.
Scientific / Simulation AI is the branch of artificial intelligence focused on systems that solve scientific problems previously intractable for humans — predicting protein structures, discovering new materials, forecasting weather at unprecedented speed, simulating physical systems, proving mathematical theorems, and dramatically compressing research timelines from years to hours.
Unlike general-purpose Generative AI or Predictive AI, Scientific AI is purpose-built for formal scientific domains — trained on physical laws, molecular data, simulation outputs, and experimental datasets. Its outputs are scientific predictions, material properties, molecular structures, simulation results, and mathematical proofs — not general text, images, or business forecasts.
Scientific AI represents one of the highest-impact frontiers of artificial intelligence, with breakthroughs like AlphaFold (protein structure prediction), GNoME (materials discovery), and GraphCast (weather forecasting) already transforming entire fields of science.
| Dimension | Detail |
|---|---|
| Core Capability | Accelerates scientific discovery by predicting, simulating, and optimising across formal scientific domains |
| How It Works | Physics-Informed Neural Networks (PINNs), Graph Neural Networks (GNNs), neural operators, differentiable simulation, RL-guided search |
| What It Produces | Molecular structures, material properties, weather forecasts, simulation outputs, mathematical proofs, drug candidates |
| Key Differentiator | Purpose-built for formal scientific domains — not general content generation or business prediction |
| AI Type | What It Does | Example |
|---|---|---|
| Scientific / Simulation AI | Solves scientific problems and models physical systems | AlphaFold predicting protein structure |
| Agentic AI | Pursues goals autonomously with tools, memory, and planning | Research agent, coding agent |
| Analytical AI | Extracts insights from datasets | Revenue dashboards, root-cause analysis |
| Autonomous AI (Non-Agentic) | Operates independently within fixed boundaries without human input | Autopilot, auto-scaling, algorithmic trading |
| Bayesian / Probabilistic AI | Reasons under uncertainty using probability distributions | Clinical trial analysis, A/B testing, risk modelling |
| Cognitive / Neuro-Symbolic AI | Combines neural learning with symbolic reasoning | LLM + knowledge graph, physics-informed neural net |
| Conversational AI | Manages multi-turn dialogue between humans and machines | Customer service chatbot, voice assistant |
| Evolutionary / Genetic AI | Optimises solutions through population-based search inspired by natural selection | Neural architecture search, logistics scheduling |
| Explainable AI (XAI) | Makes AI decisions understandable to humans | SHAP explanations, LIME, Grad-CAM |
| Generative AI | Creates new general-purpose content | Writing text, generating images |
| Multimodal Perception AI | Fuses vision, language, audio, and other modalities | GPT-4o processing image + text, AV sensor fusion |
| Optimisation / Operations Research AI | Finds optimal solutions to constrained mathematical problems | Vehicle routing, supply chain planning, scheduling |
| Physical / Embodied AI | Acts in the physical world through hardware | Autonomous vehicles, surgical robots |
| Predictive / Discriminative AI | Classifies and forecasts from business data | Credit scoring, churn prediction |
| Privacy-Preserving AI | Trains and runs AI without exposing raw data | Federated hospital models, differential privacy |
| Reactive AI | Responds to current input with no memory or learning | Thermostat, ABS braking system |
| Recommendation / Retrieval AI | Surfaces relevant items from large catalogues based on user signals | Netflix suggestions, Google Search, Spotify playlists |
| Reinforcement Learning AI | Learns optimal strategies via reward signals | Game play, robotics control |
| Symbolic / Rule-Based AI | Reasons over explicit rules and knowledge to derive conclusions | Medical expert system, legal reasoning engine |
Key Distinction from Generative AI: Generative AI creates novel content — text, images, code — from learned distributions. Scientific AI generates scientific predictions — molecular structures, simulation outputs, material properties — grounded in physical laws and scientific data. Both "generate," but the domains, training data, evaluation criteria, and output types are fundamentally different.
Key Distinction from Predictive AI: Predictive AI forecasts business outcomes from tabular data (churn, fraud, demand). Scientific AI predicts physical and biological outcomes from scientific data — protein folding, crystal stability, atmospheric dynamics — incorporating domain-specific physical constraints and laws.
Key Distinction from Reinforcement Learning AI: RL is a training methodology used by many Scientific AI systems (AlphaFold, AlphaProof). But Scientific AI is defined by its application domain (science), not its training method. Scientific AI also uses supervised learning, self-supervised learning, and physics-informed training — RL is one tool in the toolkit.