TL;DR:
rotalabs-redqueenis our framework for adversarial testing of AI systems. Instead of running a fixed list of jailbreaks, it evolves them - using quality-diversity search (MAP-Elites) to discover a diverse archive of effective attacks across single-turn, multi-turn (Crescendo-style), and agentic/MCP surfaces. Seeded runs are bit-reproducible and cross-language identical (Python and TypeScript), and any campaign projects into an audit-ready compliance report. Think of it as evolutionary penetration testing for language models and agents.
Why adversarial testing matters
Every AI system deployed in production will face adversarial inputs. Users trying to bypass content filters. Attackers attempting prompt injection. Agents being steered into misusing their tools.
Most teams do some manual red-teaming before launch. Someone spends a few hours trying to make the model say bad things, writes up the results, and moves on.
This isn’t enough.
Manual testing is inconsistent, incomplete, and doesn’t scale. You need systematic coverage of attack types. You need reproducible campaigns. You need to test after every model update, not just before launch.
But there’s a deeper problem: a fixed list of attacks only finds the vulnerabilities you already know about. Models change, guardrails change, and the jailbreaks that mattered last month get patched while new ones open up. What you actually want is a process that discovers attacks you didn’t write down - and keeps finding new ones.
That’s what Red Queen provides.
Evolve the attacks, don’t enumerate them
Red Queen treats red-teaming as a search problem. An attack is a genome; its phenotype is a stimulus (a single prompt, a multi-turn conversation, or an agentic action plan); a fitness function scores how effectively it breaks the target. Evolution then mutates and recombines the population toward higher-fitness attacks.
The twist is quality-diversity. Rather than converging on a single “best” jailbreak, Red Queen uses a MAP-Elites archive to keep the best attack in each region of a behavior space - for example (strategy × encoding × persona). The output isn’t one attack; it’s a map of the vulnerability surface, with a diverse set of working attacks you can inspect, regression-test, and turn into evidence.
The single-turn space is evolved along axes the engine actually mutates over:
- Strategy -
roleplay,hypothetical,authority,encoding,direct,multi_turn - Encoding -
none,base64,rot13,leetspeak,pig_latin,reverse - Persona, across harm categories (
violence,illegal,hate,privacy,misinformation, …)
and the same engine extends to multi-turn escalation (neutral_to_specific, research_frame, storytelling) and agentic exploits (tool_misuse, goal_hijack, memory_poisoning, context_poisoning).
Quick start
MockTarget is deterministic, so this runs offline with no API key - ideal for trying the engine and for reproducible tests.
import asyncio
from rotalabs_redqueen import (
LLMAttackGenome, JailbreakFitness, MockTarget, HeuristicJudge, evolve,
)
async def main():
target = MockTarget() # swap for OpenAITarget / AnthropicTarget / GeminiTarget / OllamaTarget
fitness = JailbreakFitness(target, HeuristicJudge())
result = await evolve(
genome_class=LLMAttackGenome,
fitness=fitness,
generations=50,
population_size=20,
seed=1234, # same seed -> same result, every time
progress=False,
)
if result.best:
print("fitness:", result.best.fitness.value)
print("prompt:", result.best.genome.to_prompt())
asyncio.run(main())
Mapping the vulnerability space with MAP-Elites
Pass an archive and you get coverage of the behavior space, not just a single winner. Each filled cell is a distinct working attack:
from rotalabs_redqueen import (
LLMAttackGenome, JailbreakFitness, MockTarget,
MapElitesArchive, BehaviorDimension, AttackStrategy, Encoding, evolve,
)
archive = MapElitesArchive(dimensions=[
BehaviorDimension("strategy", 0.0, 1.0, len(AttackStrategy)),
BehaviorDimension("encoding", 0.0, 1.0, len(Encoding)),
BehaviorDimension("has_persona", 0.0, 1.0, 2),
])
result = await evolve(
genome_class=LLMAttackGenome,
fitness=JailbreakFitness(MockTarget()),
generations=100, archive=archive, seed=1, progress=False,
)
cov = result.archive.coverage()
print(f"coverage: {cov.coverage_percent:.1f}% ({cov.filled_cells} diverse attacks)")
for ind in result.archive.get_all():
print(ind.genome.to_prompt()[:80])
Coverage tells you how much of the attack space you’ve explored; the archive hands you the working examples in each region.
Beyond single prompts: multi-turn and agentic/MCP
Because the genome’s phenotype is a Stimulus, the same engine drives every surface. Just swap the genome class:
from rotalabs_redqueen import MultiTurnGenome, AgenticGenome, JailbreakFitness, MockTarget, evolve
# Crescendo-style multi-turn escalation
mt = await evolve(genome_class=MultiTurnGenome,
fitness=JailbreakFitness(MockTarget()),
generations=50, population_size=20, seed=1, progress=False)
# Multi-step tool-use / MCP exploit plans
ag = await evolve(genome_class=AgenticGenome,
fitness=JailbreakFitness(MockTarget()),
generations=50, population_size=20, seed=1, progress=False)
To red-team a real tool-using agent, point MCPTarget at any Model Context Protocol server. It performs the MCP handshake and executes each evolved step as a tools/call, scoring the actual tool output:
from rotalabs_redqueen.llm import MCPTarget
target = MCPTarget(command=["npx", "-y", "@modelcontextprotocol/server-everything"])
# agentic action-plan steps become MCP tool calls; the tool output is what the judge scores
Testing real models
Swap MockTarget for a provider (install the matching extra, e.g. pip install rotalabs-redqueen[openai]):
from rotalabs_redqueen import OpenAITarget, AnthropicTarget, GeminiTarget, OllamaTarget
target = OpenAITarget(model="gpt-4o-mini") # reads OPENAI_API_KEY
# target = AnthropicTarget(model="claude-haiku-4-5-20251001")
# target = GeminiTarget(model="gemini-2.0-flash")
# target = OllamaTarget(model="llama3.1:8b") # local, no key
Everything else stays the same - the target is the only thing that changes.
Co-evolution: attacker vs defender
Evolve an attacker population against a defender population. Defenders evolve guardrails that reduce attack success; attackers adapt to bypass them. You get both the attacks and the mitigations that blunt them:
from rotalabs_redqueen import (
coevolve, LLMAttackGenome, SystemPromptDefense,
JailbreakFitness, DefenderBlockFitness, MockTarget, HeuristicJudge,
)
base, judge = MockTarget(), HeuristicJudge()
result = await coevolve(
attacker_class=LLMAttackGenome,
defender_class=SystemPromptDefense,
attacker_fitness_vs=lambda d: JailbreakFitness(d.as_defense(base), judge),
defender_fitness_vs=lambda a: DefenderBlockFitness(a, base, judge),
generations=20, population_size=24, seed=1,
)
print("attacker fitness:", result.attacker_fitness)
print("defender fitness:", result.defender_fitness)
print("evolved guardrail:", result.best_defender.to_dict())
Audit-ready compliance reports
Any archive projects over the attack taxonomy into standards-aligned evidence:
from rotalabs_redqueen import ReportExporter
exporter = ReportExporter()
report = exporter.export(
result.archive.get_all(),
campaign_id="run-1",
coverage=result.archive.coverage(),
)
print(exporter.render(report, "markdown").decode()) # or "json"
The report groups successful attacks by harm category and crosswalks them to OWASP LLM/Agentic Top-10, MITRE ATLAS, EU AI Act Article 55, and NIST AI RMF - turning a red-team run into documented evidence.
Continuous, accumulating red-teaming
Archives serialize, so discovered attacks accumulate across runs - a CI gate that gets stronger over time. Warm-start from yesterday’s archive, evolve against today’s model, and fail the build if anything breaks through:
from rotalabs_redqueen import (
LLMAttackGenome, JailbreakFitness, HeuristicJudge,
MapElitesArchive, BehaviorDimension, AttackStrategy, Encoding, evolve,
)
async def test_no_jailbreak_regression(target): # target = your model under test
archive = MapElitesArchive([
BehaviorDimension("strategy", 0.0, 1.0, len(AttackStrategy)),
BehaviorDimension("encoding", 0.0, 1.0, len(Encoding)),
BehaviorDimension("has_persona", 0.0, 1.0, 2),
])
result = await evolve(
genome_class=LLMAttackGenome,
fitness=JailbreakFitness(target, HeuristicJudge()),
generations=50, population_size=20, seed=1, archive=archive, progress=False,
)
# fail the build if any evolved attack succeeds
assert result.best.fitness.value < 0.5, \
f"security regression: evolved attack reached fitness {result.best.fitness.value:.2f}"
result.archive.save("file://redqueen-archive.json") # next run starts from here
Because seeded runs are bit-reproducible, the same seed gives the same campaign on every machine - no flaky security tests.
The same engine, in TypeScript
The TypeScript package @rotalabs/redqueen is at feature parity and cross-language identical: the same seed produces byte-for-byte identical archives and reports, gated by a shared L1–L5 conformance corpus.
import {
LLMAttackGenome, JailbreakFitness, MockTarget, HeuristicJudge,
MapElitesArchive, BehaviorDimension, evolve,
} from "@rotalabs/redqueen";
const archive = new MapElitesArchive([
new BehaviorDimension("strategy", 0, 1, 6),
new BehaviorDimension("encoding", 0, 1, 6),
new BehaviorDimension("has_persona", 0, 1, 2),
]);
const result = await evolve(
LLMAttackGenome,
new JailbreakFitness(new MockTarget(), new HeuristicJudge()),
{ generations: 50, populationSize: 20, seed: 1234, archive },
);
const cov = result.archive!.coverage();
console.log(`coverage: ${cov.coveragePercent.toFixed(1)}% best: ${result.best!.fitness.value}`);
Responsible use
Red Queen is a defensive tool - for finding and fixing vulnerabilities before attackers exploit them. Use it on systems you own or are authorized to test, not to attack systems without permission or to circumvent the safety of production systems you don’t control. The goal is making AI systems safer, not helping bad actors.
Installation
# Python (core + deterministic mock target)
pip install rotalabs-redqueen
# with a provider - or [llm] for all providers
pip install rotalabs-redqueen[openai]
# TypeScript / Node
npm install @rotalabs/redqueen
Resources
Need help with security testing? Contact us at [email protected].
Cite this post
@misc{rotalabs2026evolutionary-adversa,
title = "Evolutionary Adversarial Testing for AI Systems with Red Queen",
author = "Rotalabs",
year = "2026",
url = "https://rotalabs.ai/blog/adversarial-testing-redqueen/"
}