Agent Reliability
Detection and verification for autonomous AI. Methods to identify when agents hallucinate, fail, or strategically underperform.
Methods for reliable, verifiable AI systems. Papers and code releasing Q1-Q2 2026.
Nine focus areas spanning detection, verification, trust, and physical AI.
Detection and verification for autonomous AI. Methods to identify when agents hallucinate, fail, or strategically underperform.
Adversarial evaluation methods that resist gaming. Benchmarks that detect hidden capabilities and strategic behavior.
Novel memory architectures for long-horizon agent tasks. Beyond RAG - smooth retrieval and natural temporal dynamics.
Verifying AI outputs without ground truth. Methods for code, plans, and decisions from reasoning models like o3 and R1.
Practical interpretability for production. Not "understand the model" but "should I trust this output?"
Trust dynamics when agents coordinate with agents. Propagation, verification, and failure modes in multi-agent systems.
Attack taxonomies, detection methods, and defenses for agentic AI systems. Beyond prompt injection — tool poisoning, memory corruption, planning attacks, and coordination exploits in multi-agent systems.
Calibrated confidence for AI decision support. Activation-based uncertainty estimation, propagation through reasoning chains, and calibration methods that work without ground truth.
Beyond language models: AI that understands and predicts the physical world. World models, embodied reasoning, simulation, and physics-aware AI.
Papers releasing Q1-Q2 2026. Technical disclosures and preprints available now.
Type to search across all pages and posts
Press ↑ ↓ to navigate, Enter to select