Agent Reliability
AI agents fail in unpredictable ways - hallucinating, taking wrong actions, strategically underperforming on evaluations. We develop detection and verification methods for autonomous AI systems.
AI Trust & Reliability Research
Rotalabs develops methods for reliable, verifiable AI systems - detecting when agents fail, deceive, or underperform, and building the evaluation science to keep pace with reasoning models, multi-agent systems, and autonomous AI.
AI agents fail in unpredictable ways - hallucinating, taking wrong actions, strategically underperforming on evaluations. We develop detection and verification methods for autonomous AI systems.
Benchmarks are broken. Models game them. Sandbagging is real. We develop adversarial evaluation methods that resist gaming and detect hidden capabilities.
Reasoning models (o3, R1, GPT-5) are powerful but opaque. We develop methods to verify AI-generated outputs - code, plans, decisions - without exhaustive testing or ground truth.
Beyond language: AI that understands and predicts the physical world. World models, embodied reasoning, simulation, and physics-aware AI for robotics and autonomous systems.
| Area | Focus | Status |
|---|---|---|
| Agent Reliability | Detection and verification for autonomous AI systems | Papers Q1 2026 |
| AI Evaluation | Adversarial methods that resist gaming and detect hidden capabilities | Papers Q1 2026 |
| Memory Systems | Novel architectures for long-horizon agent tasks | Papers Q1 2026 |
| Reasoning Verification | Verifying AI outputs without ground truth | Papers Q1 2026 |
| Interpretability | Practical methods for production AI systems | Active |
| Multi-Agent Trust | Trust dynamics in multi-agent systems | Active |
| World Models | Physical AI, embodied reasoning, and simulation for autonomous systems | Active |
| Adversarial Robustness | Attack taxonomies and defenses for agentic AI | Active |
| Uncertainty Quantification | Calibrated confidence for AI decision support | Active |
Steering vectors for runtime behavior control in LLMs.
Verification framework for AI-generated content.
Detection tools for AI safety and sandbagging.
Adversarial testing for AI systems.
Comprehensive LLM evaluation with statistical rigor.
Trust-based routing and decision cascades.
Type to search across all pages and posts
Press ↑ ↓ to navigate, Enter to select