Blog

Technical deep-dives, research updates, and tutorials

Insights from Rotalabs on AI trust, reliability, and evaluation.

Statistical Rigor in LLM Evaluation

Why most LLM benchmarks are doing evaluation wrong, and how to fix it with confidence intervals, significance tests, and effect...

Adversarial Testing for AI Systems with RedQueen

A systematic approach to red-teaming AI systems. Attack taxonomies, automated campaigns, and finding vulnerabilities before attackers do.

When AI Agents Don't Know What They Don't Know

The security conversation around AI agents is stuck on identity and permissions. The harder problem is whether an agent should...

Runtime AI Control for Contested Environments

Prompting is not a control mechanism. When AI operates in kill chains and beyond reliable comms, 'the model usually follows...

Detecting AI Sandbagging with Activation Probes

First empirical demonstration of activation-level sandbagging detection. Linear probes achieve 90-96% accuracy across Mistral, Gemma, and Qwen models. Sandbagging representations...

Stay Updated

Subscribe to our newsletter

Research updates, tool releases, and thoughts on AI trust. No spam.