Open research lab · AI agent reliability

Research
you can
run.

Rotalabs studies how AI agents fail when they act in the world — and builds the benchmarks, probes, monitors, and protocols to measure it. Every result is published, installable, and reproducible from the raw trajectories up.

Open source  ·  Pre-registered protocols  ·  Reproducible from saved runs
Probe instrument · GlassBox study
A static probe instrument for the GlassBox study: held-out pattern, internal signal, threshold, false escalation, and verdict. HELD-OUT PATTERN RESIDUAL PROBE τ = 0.50 SIGNALPATTERNAUDITDECISION PROBEJUDGEFALSE-ESCVERDICT
GlassBox, rendered as a measuring instrument: held-out pattern, internal signal, threshold, false escalation, verdict.
Pre-registered audit · null resultPreprint — coming soon

GlassBox: a cross-pattern audit of linear probe gates.

A surface-clean paired-twin benchmark for irreversible banking actions, testing whether residual-stream probes can separate substantively wrong actions from legitimate twins under cross-pattern transfer.

24 cases · 4 patterns · 5 model families

Across three evaluable open-weight families, the linear probe showed no cross-pattern edge over a default-prompted judge. We publish the whole audit — including where our own method fails: paper, benchmark, trajectories, trained probes, and pipeline. The point is the protocol.

BenchmarkTrajectoriesAudit pipeline
Flagship study — coming soon

GlassBox is not a claim. It is an instrument.

A benchmark, a protocol, and a reusable audit framework for testing monitors that gate irreversible agentic actions. It is designed to be re-run, challenged, and extended — a concrete research surface, not just a position statement.

FindingA pre-registered null: no operating point in the layer sweep yields a separating boundary on unseen patterns. We report it rather than bury it.
BenchmarkSurface-clean paired twins across wrong-action patterns, explicitly measuring false escalation rather than hiding it.
ProtocolLeave-one-pattern-out evaluation, net-of-false-escalation scoring, and a pre-registered credit gate.
ReleasePaper, dataset, trajectories, probes, and audit code — published under the Rotalabs research distribution.
Research program

Nine fronts on agent reliability.

Interpretability probes, adversarial testing, verification, evaluation science, steering, and trust propagation — released as papers and installable methods through 2026.

01
Agent Reliability
Detection and verification for autonomous AI. Methods to identify when agents hallucinate, fail, or strategically underperform.
Papers Q1 2026
02
AI Evaluation Science
Adversarial evaluation methods that resist gaming. Benchmarks that detect hidden capabilities and strategic behavior.
Papers Q1 2026
03
Memory Systems
Field-theoretic memory treating stored information as continuous fields governed by PDEs — semantic diffusion, thermodynamic decay, and multi-agent field coupling.
Published
04
Reasoning Verification
Verifying AI outputs without ground truth. Methods for code, plans, and decisions from reasoning models like o3 and R1.
Papers Q1 2026
05
Interpretability
Practical interpretability for production. Not "understand the model" but "should I trust this output?"
Active
06
Multi-Agent Trust
Trust dynamics when agents coordinate with agents. Propagation, verification, and failure modes in multi-agent systems.
Active
07
Adversarial Robustness for Agents
Attack taxonomies, detection, and defenses for agentic AI — beyond prompt injection: tool poisoning, memory corruption, planning attacks, and coordination exploits.
Active
08
Uncertainty Quantification
Calibrated confidence for AI decision support. Activation-based uncertainty estimation, propagation through reasoning chains, and calibration without ground truth.
Active
09
World Models & Physical AI
Beyond language models: AI that understands and predicts the physical world. World models, embodied reasoning, simulation, and physics-aware AI.
Active
Research distribution

Papers are only half the interface.

Rotalabs publishes installable packages so methods can be inspected, reproduced, and embedded into new experiments — all 12 on PyPI and npm, AGPL-3.0.

$ pip install rotalabs-probe
$ pip install rotalabs-redqueen
$ pip install rotalabs-verity
$ pip install rotalabs-eval
# probes · adversarial testing · verification · evaluation

rotalabs-context

Context intelligence for AI agents.

rotalabs-probe

Sandbagging detection via activation probes.

rotalabs-steer

Steering vectors for runtime behavior control.

rotalabs-verity

Neuro-symbolic verified code synthesis.

rotalabs-cascade

Trust-based decision routing.

Recent papers

Published, preprint, in progress.

02

Field-Theoretic Memory for AI Agents

Continuous Dynamics for Context Preservation

A memory architecture treating stored information as continuous fields governed by partial differential equations — semantic diffusion, thermodynamic decay, and field coupling for persistent, composable agent memory.

+116%F1 on multi-session reasoning
>99.8%Collective intelligence (multi-agent)
memoryagentsfield theory
Published
03

Verity: Neuro-Symbolic Synthesis of Verified Distributed Systems

CE2P translates formal-verification failures into structured LLM feedback. The benefit is inversely correlated with model capability — weaker models gain the most.

verificationreasoning
Preprint
04

Adversarial Testing for AI Systems

Evolutionary approaches where attack and defense strategies improve through competitive pressure.

adversarialagents
Q1 2026
Credibility is not claimed. It is released — and open to be broken.
Rotalabs operating principle
Enterprise

Rotalabs is our open research lab. The methods behind this work are available to enterprises as products and consulting through Rotascale →