Blog

Notes from the lab.

Technical deep-dives, research updates, and tutorials on AI agent reliability, evaluation, and verification.

May 31, 2026 · Latest

Evolutionary Adversarial Testing for AI Systems with Red Queen

Red-teaming that evolves its own attacks. rotalabs-redqueen uses quality-diversity search to discover diverse jailbreaks across single-turn, multi-turn, and agentic/MCP surfaces — reproducibly, and as audit-ready evidence.

adversarial-testingred-teamingrotalabs-redqueenquality-diversity

May 30, 2026

Opus 4.8 Got Better at Everything Except Resisting You

Claude Opus 4.8 beats its predecessor on nearly every capability benchmark. It's also somewhat less robust to prompt injection than Opus 4.7 - and Anthropic's own first live...

prompt-injectionsecurityai-safetyagents

Mar 01, 2026

Field-Theoretic Memory for AI Agents

We treat agent memory as continuous fields governed by partial differential equations instead of discrete database entries. The result: +116% F1 on multi-session reasoning and >99.8% collective intelligence...

memoryagentsfield-theorymulti-agent-trust

Feb 18, 2026

Shared Context for AI Agents: Introducing rotalabs-context

AI agents that can't share what they know make the same mistakes independently. We're releasing rotalabs-context - a context intelligence engine for ingesting, searching, and subscribing to shared...

open-sourcecontext-intelligencemulti-agent-trustagent-reliability

Feb 07, 2026

Statistical Rigor in LLM Evaluation

Why most LLM benchmarks are doing evaluation wrong, and how to fix it with confidence intervals, significance tests, and effect sizes.

evaluationstatisticsrotalabs-evalbenchmarks

Feb 03, 2026

Steering Vectors: Runtime Behavior Control for LLMs

How to extract behavioral directions from language models and apply them at inference time. A practical guide to rotalabs-steer.

steering-vectorsinterpretabilityrotalabs-steerbehavior-control

Feb 01, 2026

Agent-to-Agent Networks: Trust Dynamics and Attack Surfaces in Moltbook

A technical analysis of trust dynamics, emergent behaviors, and security vulnerabilities in Moltbook - the first large-scale agent-to-agent social network.

moltbookmulti-agent-trustprompt-injectionagent-reliability

Jan 30, 2026

Introducing the Rotalabs Open Source Ecosystem

We're releasing 12 packages for AI trust, evaluation, and reliability. Available on PyPI and npm, all AGPL-3.0 licensed.

open-sourcereleaseai-safetyevaluation

Jan 20, 2026

When AI Agents Don't Know What They Don't Know

The security conversation around AI agents is stuck on identity and permissions. The harder problem is whether an agent should take an action - whether its reasoning is...

uncertaintyagentscalibrationreliability

Jan 10, 2026

Multi-Agent Coordination Without Centralized Control

The standard architecture for coordinating autonomous systems assumes a command node. That works until the link goes down. Here's how to build coordination that survives contested environments.

multi-agentcoordinationautonomydefense

Dec 15, 2025

Runtime AI Control for Contested Environments

Prompting is not a control mechanism. When AI operates in kill chains and beyond reliable comms, 'the model usually follows instructions' isn't acceptable. Here's what I've learned about...

steeringdefenseactivation-engineeringagents

Nov 23, 2025

MCP is Infrastructure. Trust is the Missing Layer.

The Model Context Protocol connects AI agents to the world. Everyone's focused on securing the connections. But you can authenticate every MCP call and still have a compromised...

mcpmulti-agentsecuritytrust

Oct 23, 2025

The CoT Blind Spot: Why Watching What Models Say Isn't Enough

Chain-of-thought monitoring was supposed to let us supervise AI reasoning. But new research shows models only faithfully report their reasoning 25-39% of the time. If we can't trust...

chain-of-thoughtinterpretabilitymonitoringai-safety

Sep 18, 2025

Memory Poisoning: The Attack Vector Nobody's Ready For

A single compromised agent poisoned 87% of downstream decisions in 4 hours. As AI agents gain persistent memory, attackers are finding ways to corrupt it. Here's what the...

memorysecurityai-safetyagents

Aug 17, 2025

Prompt Injection Is Still Unsolved: What the Latest Research Actually Shows

A joint paper from OpenAI, Anthropic, and DeepMind bypassed 12 published defenses. NAACL 2025 broke 8 more. Here's where we actually stand on prompt injection, and why activation-level...

prompt-injectionsecurityai-safetyagents

Jul 15, 2025

The Multi-Agent Trust Problem: Why Your AI Agents Can't Verify Each Other

As AI agents collaborate on complex tasks, a critical question emerges: how does Agent A know Agent B isn't compromised, hallucinating, or colluding? Current approaches don't have good...

multi-agenttrustai-safetyagents

Jun 20, 2025

Detecting AI Sandbagging with Activation Probes

First empirical demonstration of activation-level sandbagging detection. Linear probes achieve 90-96% accuracy across Mistral, Gemma, and Qwen models. Sandbagging representations are model-specific, and steering can reduce sandbagging by...

sandbaggingactivation-probingai-safetyinterpretability