Papers

Preprints and technical disclosures

Research on AI trust, verification, and agent reliability.

2026

02

Verity: Neuro-Symbolic Synthesis of Verified Distributed Systems

CE2P translates formal verification failures into structured LLM feedback. Benefit inversely correlated with model capability - weaker models gain 34-39pp.

verification reasoning
03

Adversarial Testing for AI Systems

Evolutionary approaches where attack and defense strategies improve through competitive pressure.

adversarial agents
04

Trust-Based Decision Routing

Formal framework for ROI-based decision routing in multi-tier verification systems.

trust decision-support

In Progress

05

Attack Taxonomy for Agentic AI Systems

Comprehensive threat model covering input, state, tool, planning, and coordination attacks. Empirical evaluation across agent architectures.

adversarial agents security
06

Activation-Based Attack Detection for Autonomous Agents

Interpretability methods to detect adversarial attacks on agentic systems. Extends metacognitive probing to security.

interpretability security
07

Calibrated Uncertainty in LLM Reasoning Chains

Uncertainty quantification tracking actual accuracy. Activation-based estimation and propagation through multi-step reasoning chains.

uncertainty calibration

Prior Work

Defensive publications by the founding team during prior work at Google. Published via TD Commons, CC BY 4.0.

Collaborate

Work with us

For Researchers

Co-author a paper

We welcome collaborators across AI safety, interpretability, formal methods, and adversarial ML.

Get in touch
For Organizations

Pilot our tools

Deploy trust infrastructure in production. All packages are open source, with enterprise support via Rotascale.

View packages
Open Source

Contribute on GitHub

All core research, benchmarks, and tools are open source. Issues, PRs, and discussions welcome.

GitHub