New Open source packages now available on PyPI & npm Read more →
Latest

Topics

All Posts

When AI Agents Don't Know What They Don't Know

The security conversation around AI agents is stuck on identity and permissions. The harder problem is whether an agent should...

Runtime AI Control for Contested Environments

Prompting is not a control mechanism. When AI operates in kill chains and beyond reliable comms, 'the model usually follows...

Detecting AI Sandbagging with Activation Probes

First empirical demonstration of activation-level sandbagging detection. Linear probes achieve 90-96% accuracy across Mistral, Gemma, and Qwen models. Sandbagging representations...