Papers

Preprints and disclosures.

Research on AI agent reliability, evaluation, verification, and memory. We link a paper only when there's a real artifact behind it — no placeholder IDs, no fabricated venues.

Flagship

GlassBox.

Published

Peer-reachable work.

02

Field-Theoretic Memory for AI Agents

Continuous Dynamics for Context Preservation

A memory architecture treating stored information as continuous fields governed by partial differential equations — semantic diffusion, thermodynamic decay, and field coupling for persistent, composable agent memory.

+116%F1 on multi-session reasoning
>99.8%Collective intelligence (multi-agent)
memoryagentsfield theory
Published
03

Quality-Diversity Evolution for Discovering Diverse Vulnerabilities in LLM Safety

Semantic MAP-Elites red-teaming across frontier LLMs

A quality-diversity evolutionary framework that red-teams LLMs at the semantic level — evolving interpretable attack strategies (not token sequences) and using a MAP-Elites archive to illuminate distinct vulnerability profiles across GPT-4o-mini, Claude 3.5 Sonnet, Gemini 2.0 Flash, and an open-weight model. The contribution is an interpretable, reproducible baseline a safety team can triage and defend against.

6 strategies × 6 encodings · 4 LLMs
adversarialred-teamingevaluationsafety
ICLR 2026 · AI WILD
2026

Coming this year.

04

Verity: Neuro-Symbolic Synthesis of Verified Distributed Systems

CE2P translates formal-verification failures into structured LLM feedback. The benefit is inversely correlated with model capability — weaker models gain the most.

verificationreasoning
Preprint
05

Trust-Based Decision Routing

A formal framework for ROI-based decision routing in multi-tier verification systems.

trustdecision-support
Q1 2026
In progress

On the bench.

06

Attack Taxonomy for Agentic AI Systems

A comprehensive threat model covering input, state, tool, planning, and coordination attacks, with empirical evaluation across agent architectures.

adversarialagentssecurity
Q2 2026
07

Activation-Based Attack Detection for Autonomous Agents

Interpretability methods to detect adversarial attacks on agentic systems — extending metacognitive probing to security.

interpretabilitysecurity
Q2 2026
08

Calibrated Uncertainty in LLM Reasoning Chains

Uncertainty quantification that tracks actual accuracy — activation-based estimation and propagation through multi-step reasoning chains.

uncertaintycalibration
Q3 2026
Collaborate

Work with us.

For researchers

Co-author a paper

We welcome collaborators across AI safety, interpretability, formal methods, and adversarial ML.

Get in touch →
For organizations

Pilot our tools

Deploy reliability infrastructure in production. All packages are open source, with enterprise support via Rotascale.

View packages →
Open source

Contribute on GitHub

All research, benchmarks, and tools are open source. Issues, PRs, and discussions welcome.

GitHub →