Papers

Preprints and disclosures.

Research on AI agent reliability, evaluation, verification, and memory. We link a paper only when there's a real artifact behind it - no placeholder IDs, no fabricated venues.

Published

Peer-reachable work.

Field-Theoretic Memory for AI Agents

Continuous Dynamics for Context Preservation

A memory architecture treating stored information as continuous fields governed by partial differential equations - semantic diffusion, thermodynamic decay, and field coupling for persistent, composable agent memory.

+116%F1 on multi-session reasoning

>99.8%Collective intelligence (multi-agent)

memoryagentsfield theory

Published

arXiv GitHub PyPI

Quality-Diversity Evolution for Discovering Diverse Vulnerabilities in LLM Safety

Semantic MAP-Elites red-teaming across frontier LLMs

A quality-diversity evolutionary framework that red-teams LLMs at the semantic level - evolving interpretable attack strategies (not token sequences) and using a MAP-Elites archive to illuminate distinct vulnerability profiles across GPT-4o-mini, Claude 3.5 Sonnet, Gemini 2.0 Flash, and an open-weight model. The contribution is an interpretable, reproducible baseline a safety team can triage and defend against.

6 strategies × 6 encodings · 4 LLMs

adversarialred-teamingevaluationsafety

ICLR 2026 · AI WILD

arXiv GitHub

Cross-Generational Transfer of Adversarial Attacks Reveals Non-Monotonic Safety Alignment in LLMs

A longitudinal QD red-team of the Gemma family across four generations

Applies quality-diversity evolution (MAP-Elites) as automated red-teaming across four generations of Google's Gemma family (7B–31B). Safety alignment does not improve monotonically: the mid-series generation is markedly more attackable than the ones on either side of it, copyright and cybercrime vulnerabilities persist near-universally across all four, and misinformation susceptibility surges between generations before only partly receding. Attack transfer and longitudinal probing surface failures that static, single-snapshot benchmarks miss.

Gemma family · 4 generations · 7B–31B

adversarialred-teamingsafetyevaluation

Preprint

arXiv

Closing the Activation-Cone Blind Spot: Response-Time Probing and Unified Defense

Why prompt-time activation defenses miss prefilling attacks, and what closes the gap

Evaluates five jailbreak-defense paradigms across seven instruction-tuned models (7B-31B) and five attack families, showing that prompt-time activation defenses are structurally blind to prefilling attacks. Response-time probing on the first generated tokens closes the gap; composed with null-space steering it drives prefilling attack success to zero with no false positives on benign inputs.

7 models · 5 attack families · 5 defense paradigms

0.97-1.00AUROC, response-time probing

0%Prefilling attack success after defense

interpretabilitysecuritysteeringsafety

Preprint

arXiv

Spark-LLM-Eval: A Distributed Framework for Statistically Rigorous LLM Evaluation

Treats LLM evaluation as a data-parallel problem on Apache Spark: bootstrap confidence intervals, paired significance tests (t-test, McNemar's, Wilcoxon signed-rank), and content-addressable response caching backed by Delta Lake, with linear scaling.

evaluationstatisticsdistributed

Preprint

arXiv GitHub PyPI

Cross-Platform Fused MoE Dispatch in Triton: Portable Expert Routing Without CUDA

TritonMoE, a Mixture-of-Experts inference kernel written entirely in Triton with no CUDA. A fused gate+up GEMM computes both SwiGLU projections from shared tile loads, and the kernel runs unchanged on NVIDIA A100 and AMD MI300X.

89-131%of Megablocks throughput (batch <= 512)

-35%Global memory traffic from fusion

inferencekernelsefficiency

Preprint

arXiv GitHub PyPI

The hard problem

Can you catch a wrong agent action before it executes?

When an agent moves money, exports data, or files a report, a wrong action can't be undone after the fact. GlassBox is how we measure whether a monitor can catch one first: a surface-clean benchmark that pairs every wrong action with a legitimate twin, plus a pre-registered protocol any team can re-run against any monitor. We report the result straight, including that the white-box probe we tried didn't beat the baseline.

GlassBox: A Cross-Pattern Audit of Linear Probe Gates

A pre-registered, null-result benchmark for monitors that gate irreversible agentic actions.

24 cases · 4 patterns · 5 model families

agentsinterpretabilityevaluationbanking

Preprint - coming soon

code on release Read →

2026

Coming this year.

Verity: Neuro-Symbolic Synthesis of Verified Distributed Systems

CE2P translates formal-verification failures into structured LLM feedback. The benefit is inversely correlated with model capability - weaker models gain the most.

verificationreasoning

Preprint

GitHub PyPI

Trust-Based Decision Routing

A formal framework for ROI-based decision routing in multi-tier verification systems.

trustdecision-support

Q1 2026

In progress

On the bench.

Attack Taxonomy for Agentic AI Systems

A comprehensive threat model covering input, state, tool, planning, and coordination attacks, with empirical evaluation across agent architectures.

adversarialagentssecurity

Q2 2026

Calibrated Uncertainty in LLM Reasoning Chains

Uncertainty quantification that tracks actual accuracy - activation-based estimation and propagation through multi-step reasoning chains.

uncertaintycalibration

Q3 2026

Prior work

Defensive publications.

Defensive publications by the founding team during prior work at Google. Published via TD Commons, CC BY 4.0.

Nov 2025

Work with us.

For researchers

Co-author a paper

We welcome collaborators across AI safety, interpretability, formal methods, and adversarial ML.

Get in touch →

For organizations

Pilot our tools

Deploy reliability infrastructure in production. All packages are open source, with enterprise support via Rotascale.

View packages →

Open source

Contribute on GitHub

All research, benchmarks, and tools are open source. Issues, PRs, and discussions welcome.

GitHub →

Preprints and disclosures.

Peer-reachable work.

Field-Theoretic Memory for AI Agents

Quality-Diversity Evolution for Discovering Diverse Vulnerabilities in LLM Safety

Cross-Generational Transfer of Adversarial Attacks Reveals Non-Monotonic Safety Alignment in LLMs

Closing the Activation-Cone Blind Spot: Response-Time Probing and Unified Defense

Spark-LLM-Eval: A Distributed Framework for Statistically Rigorous LLM Evaluation

Cross-Platform Fused MoE Dispatch in Triton: Portable Expert Routing Without CUDA

Can you catch a wrong agent action before it executes?

GlassBox: A Cross-Pattern Audit of Linear Probe Gates

Coming this year.

Verity: Neuro-Symbolic Synthesis of Verified Distributed Systems

Trust-Based Decision Routing

On the bench.

Attack Taxonomy for Agentic AI Systems

Calibrated Uncertainty in LLM Reasoning Chains

Defensive publications.

UPIR: Universal Plan Intermediate Representation

ARTEMIS: Multi-Agent Debate Framework

Context System for AI Applications

ETLC: Context-First Data Processing

Work with us.

Co-author a paper

Pilot our tools

Contribute on GitHub