Agent Reliability
Detection and verification for autonomous AI. Methods to identify when agents hallucinate, fail, or strategically underperform.
Nine focus areas across detection, verification, trust, and physical AI. Papers and code release through 2026 — every claim backed by an installable method.
Detection and verification for autonomous AI. Methods to identify when agents hallucinate, fail, or strategically underperform.
Adversarial evaluation methods that resist gaming. Benchmarks that detect hidden capabilities and strategic behavior.
Field-theoretic memory treating stored information as continuous fields governed by PDEs — semantic diffusion, thermodynamic decay, and multi-agent field coupling. arXiv 2602.21220
Verifying AI outputs without ground truth. Methods for code, plans, and decisions from reasoning models like o3 and R1.
Practical interpretability for production. Not "understand the model" but "should I trust this output?"
Trust dynamics when agents coordinate with agents. Propagation, verification, and failure modes in multi-agent systems.
Attack taxonomies, detection, and defenses for agentic AI — beyond prompt injection: tool poisoning, memory corruption, planning attacks, and coordination exploits.
Calibrated confidence for AI decision support. Activation-based uncertainty estimation, propagation through reasoning chains, and calibration without ground truth.
Beyond language models: AI that understands and predicts the physical world. World models, embodied reasoning, simulation, and physics-aware AI.
Every research area ships installable, reproducible packages — 12 libraries on PyPI and npm, AGPL-3.0.
Type to search across all pages and posts
Press ↑ ↓ to navigate, Enter to select