Blog

Notes from the lab.

Technical deep-dives, research updates, and tutorials on AI agent reliability, evaluation, and verification.

May 31, 2026 · Latest

Evolutionary Adversarial Testing for AI Systems with Red Queen

Red-teaming that evolves its own attacks. rotalabs-redqueen uses quality-diversity search to discover diverse jailbreaks across single-turn, multi-turn, and agentic/MCP surfaces — reproducibly, and as audit-ready evidence.

adversarial-testingred-teamingrotalabs-redqueenquality-diversity
May 30, 2026

Opus 4.8 Got Better at Everything Except Resisting You

Claude Opus 4.8 beats its predecessor on nearly every capability benchmark. It's also somewhat less robust to prompt injection than Opus 4.7 - and Anthropic's own first live...

prompt-injectionsecurityai-safetyagents
Mar 01, 2026

Field-Theoretic Memory for AI Agents

We treat agent memory as continuous fields governed by partial differential equations instead of discrete database entries. The result: +116% F1 on multi-session reasoning and >99.8% collective intelligence...

memoryagentsfield-theorymulti-agent-trust
Feb 18, 2026

Shared Context for AI Agents: Introducing rotalabs-context

AI agents that can't share what they know make the same mistakes independently. We're releasing rotalabs-context - a context intelligence engine for ingesting, searching, and subscribing to shared...

open-sourcecontext-intelligencemulti-agent-trustagent-reliability
Feb 07, 2026

Statistical Rigor in LLM Evaluation

Why most LLM benchmarks are doing evaluation wrong, and how to fix it with confidence intervals, significance tests, and effect sizes.

evaluationstatisticsrotalabs-evalbenchmarks
Feb 03, 2026

Steering Vectors: Runtime Behavior Control for LLMs

How to extract behavioral directions from language models and apply them at inference time. A practical guide to rotalabs-steer.

steering-vectorsinterpretabilityrotalabs-steerbehavior-control
Jan 20, 2026

When AI Agents Don't Know What They Don't Know

The security conversation around AI agents is stuck on identity and permissions. The harder problem is whether an agent should take an action - whether its reasoning is...

uncertaintyagentscalibrationreliability
Jan 10, 2026

Multi-Agent Coordination Without Centralized Control

The standard architecture for coordinating autonomous systems assumes a command node. That works until the link goes down. Here's how to build coordination that survives contested environments.

multi-agentcoordinationautonomydefense
Dec 15, 2025

Runtime AI Control for Contested Environments

Prompting is not a control mechanism. When AI operates in kill chains and beyond reliable comms, 'the model usually follows instructions' isn't acceptable. Here's what I've learned about...

steeringdefenseactivation-engineeringagents
Nov 23, 2025

MCP is Infrastructure. Trust is the Missing Layer.

The Model Context Protocol connects AI agents to the world. Everyone's focused on securing the connections. But you can authenticate every MCP call and still have a compromised...

mcpmulti-agentsecuritytrust
Oct 23, 2025

The CoT Blind Spot: Why Watching What Models Say Isn't Enough

Chain-of-thought monitoring was supposed to let us supervise AI reasoning. But new research shows models only faithfully report their reasoning 25-39% of the time. If we can't trust...

chain-of-thoughtinterpretabilitymonitoringai-safety
Sep 18, 2025

Memory Poisoning: The Attack Vector Nobody's Ready For

A single compromised agent poisoned 87% of downstream decisions in 4 hours. As AI agents gain persistent memory, attackers are finding ways to corrupt it. Here's what the...

memorysecurityai-safetyagents
Jun 20, 2025

Detecting AI Sandbagging with Activation Probes

First empirical demonstration of activation-level sandbagging detection. Linear probes achieve 90-96% accuracy across Mistral, Gemma, and Qwen models. Sandbagging representations are model-specific, and steering can reduce sandbagging by...

sandbaggingactivation-probingai-safetyinterpretability