Field-Theoretic Memory for AI Agents
Continuous Dynamics for Context Preservation
Rotalabs studies how AI agents fail when they act in the world, and builds the benchmarks, probes, monitors, and protocols to measure it. Every result is published, installable, and reproducible from the saved runs.
Red-teaming that evolves its own attacks. A quality-diversity (MAP-Elites) search illuminates a diverse archive of interpretable attack strategies - not opaque token strings - across GPT-4o-mini, Claude, Gemini, and open-weight models.
Accepted at the ICLR 2026 Agents in the Wild workshop, and shipped as rotalabs-redqueen - seeded runs are bit-reproducible and cross-language identical across Python and TypeScript. A baseline a safety team can triage, reproduce, and defend against.
This is the work we've pushed hardest on. The method is peer-reviewed, it ships as a package anyone can run, and it's already turned up a real finding about how model safety shifts from one model generation to the next.
Interpretability probes, adversarial testing, verification, evaluation science, steering, and trust propagation - released as papers and installable methods through 2026.
When an agent moves money, exports customer data, or files a report, a wrong action can't be undone after the fact. The question is whether a monitor can catch one before it runs. GlassBox is how we measure that: a surface-clean benchmark that pairs every wrong action with a legitimate twin, so a monitor has to catch the wrongness, not the topic. We pre-registered the test and report the result straight, including that the white-box probe we tried didn't beat the baseline. The benchmark is built so any team can run the same test against any monitor.
A pre-registered, null-result benchmark for monitors that gate irreversible agentic actions.
A surface-clean, paired-twin benchmark for irreversible agent actions, plus a pre-registered audit protocol any team can re-run against any monitor. We release the benchmark, trajectories, trained probes, and pipeline in full.
Rotalabs publishes installable packages so methods can be inspected, reproduced, and embedded into new experiments - all 12 on PyPI and npm, AGPL-3.0.
Context intelligence for AI agents.
Sandbagging detection via activation probes.
Steering vectors for runtime behavior control.
Neuro-symbolic verified code synthesis.
Trust-based decision routing.
Continuous Dynamics for Context Preservation
Semantic MAP-Elites red-teaming across frontier LLMs
A longitudinal QD red-team of the Gemma family across four generations
Red-teaming that evolves its own attacks. rotalabs-redqueen uses quality-diversity search to discover diverse jailbreaks across single-turn, multi-turn, and agentic/MCP surfaces - reproducibly, and as...
Claude Opus 4.8 beats its predecessor on nearly every capability benchmark. It's also somewhat less robust to prompt injection than Opus 4.7 - and...
We treat agent memory as continuous fields governed by partial differential equations instead of discrete database entries. The result: +116% F1 on multi-session reasoning...
Publish the work, and let people try to break it. That's the only credibility worth having.Rotalabs operating principle
Rotalabs is our open research lab. The methods behind this work are available to enterprises as products and consulting through Rotascale →
Type to search across all pages and posts
Press ↑ ↓ to navigate, Enter to select