Evolutionary Adversarial Testing for AI Systems with Red Queen

TL;DR: rotalabs-redqueen is our framework for adversarial testing of AI systems. Instead of running a fixed list of jailbreaks, it evolves them - using quality-diversity search (MAP-Elites) to discover a diverse archive of effective attacks across single-turn, multi-turn (Crescendo-style), and agentic/MCP surfaces. Seeded runs are bit-reproducible and cross-language identical (Python and TypeScript), and any campaign projects into an audit-ready compliance report. Think of it as evolutionary penetration testing for language models and agents.

Why adversarial testing matters

Every AI system deployed in production will face adversarial inputs. Users trying to bypass content filters. Attackers attempting prompt injection. Agents being steered into misusing their tools.

Most teams do some manual red-teaming before launch. Someone spends a few hours trying to make the model say bad things, writes up the results, and moves on.

This isn’t enough.

Manual testing is inconsistent, incomplete, and doesn’t scale. You need systematic coverage of attack types. You need reproducible campaigns. You need to test after every model update, not just before launch.

But there’s a deeper problem: a fixed list of attacks only finds the vulnerabilities you already know about. Models change, guardrails change, and the jailbreaks that mattered last month get patched while new ones open up. What you actually want is a process that discovers attacks you didn’t write down - and keeps finding new ones.

That’s what Red Queen provides.

Evolve the attacks, don’t enumerate them

Red Queen treats red-teaming as a search problem. An attack is a genome; its phenotype is a stimulus (a single prompt, a multi-turn conversation, or an agentic action plan); a fitness function scores how effectively it breaks the target. Evolution then mutates and recombines the population toward higher-fitness attacks.

The twist is quality-diversity. Rather than converging on a single “best” jailbreak, Red Queen uses a MAP-Elites archive to keep the best attack in each region of a behavior space - for example (strategy × encoding × persona). The output isn’t one attack; it’s a map of the vulnerability surface, with a diverse set of working attacks you can inspect, regression-test, and turn into evidence.

The single-turn space is evolved along axes the engine actually mutates over:

Strategy - roleplay, hypothetical, authority, encoding, direct, multi_turn
Encoding - none, base64, rot13, leetspeak, pig_latin, reverse
Persona, across harm categories (violence, illegal, hate, privacy, misinformation, …)

and the same engine extends to multi-turn escalation (neutral_to_specific, research_frame, storytelling) and agentic exploits (tool_misuse, goal_hijack, memory_poisoning, context_poisoning).

Quick start

MockTarget is deterministic, so this runs offline with no API key - ideal for trying the engine and for reproducible tests.

import asyncio
from rotalabs_redqueen import (
    LLMAttackGenome, JailbreakFitness, MockTarget, HeuristicJudge, evolve,
)

async def main():
    target = MockTarget()  # swap for OpenAITarget / AnthropicTarget / GeminiTarget / OllamaTarget
    fitness = JailbreakFitness(target, HeuristicJudge())

    result = await evolve(
        genome_class=LLMAttackGenome,
        fitness=fitness,
        generations=50,
        population_size=20,
        seed=1234,            # same seed -> same result, every time
        progress=False,
    )

    if result.best:
        print("fitness:", result.best.fitness.value)
        print("prompt:", result.best.genome.to_prompt())

asyncio.run(main())

Mapping the vulnerability space with MAP-Elites

Pass an archive and you get coverage of the behavior space, not just a single winner. Each filled cell is a distinct working attack:

from rotalabs_redqueen import (
    LLMAttackGenome, JailbreakFitness, MockTarget,
    MapElitesArchive, BehaviorDimension, AttackStrategy, Encoding, evolve,
)

archive = MapElitesArchive(dimensions=[
    BehaviorDimension("strategy", 0.0, 1.0, len(AttackStrategy)),
    BehaviorDimension("encoding", 0.0, 1.0, len(Encoding)),
    BehaviorDimension("has_persona", 0.0, 1.0, 2),
])

result = await evolve(
    genome_class=LLMAttackGenome,
    fitness=JailbreakFitness(MockTarget()),
    generations=100, archive=archive, seed=1, progress=False,
)

cov = result.archive.coverage()
print(f"coverage: {cov.coverage_percent:.1f}% ({cov.filled_cells} diverse attacks)")

for ind in result.archive.get_all():
    print(ind.genome.to_prompt()[:80])

Coverage tells you how much of the attack space you’ve explored; the archive hands you the working examples in each region.

Beyond single prompts: multi-turn and agentic/MCP

Because the genome’s phenotype is a Stimulus, the same engine drives every surface. Just swap the genome class:

from rotalabs_redqueen import MultiTurnGenome, AgenticGenome, JailbreakFitness, MockTarget, evolve

# Crescendo-style multi-turn escalation
mt = await evolve(genome_class=MultiTurnGenome,
                  fitness=JailbreakFitness(MockTarget()),
                  generations=50, population_size=20, seed=1, progress=False)

# Multi-step tool-use / MCP exploit plans
ag = await evolve(genome_class=AgenticGenome,
                  fitness=JailbreakFitness(MockTarget()),
                  generations=50, population_size=20, seed=1, progress=False)

To red-team a real tool-using agent, point MCPTarget at any Model Context Protocol server. It performs the MCP handshake and executes each evolved step as a tools/call, scoring the actual tool output:

from rotalabs_redqueen.llm import MCPTarget

target = MCPTarget(command=["npx", "-y", "@modelcontextprotocol/server-everything"])
# agentic action-plan steps become MCP tool calls; the tool output is what the judge scores

Testing real models

Swap MockTarget for a provider (install the matching extra, e.g. pip install rotalabs-redqueen[openai]):

from rotalabs_redqueen import OpenAITarget, AnthropicTarget, GeminiTarget, OllamaTarget

target = OpenAITarget(model="gpt-4o-mini")                  # reads OPENAI_API_KEY
# target = AnthropicTarget(model="claude-haiku-4-5-20251001")
# target = GeminiTarget(model="gemini-2.0-flash")
# target = OllamaTarget(model="llama3.1:8b")                 # local, no key

Everything else stays the same - the target is the only thing that changes.

Co-evolution: attacker vs defender

Evolve an attacker population against a defender population. Defenders evolve guardrails that reduce attack success; attackers adapt to bypass them. You get both the attacks and the mitigations that blunt them:

from rotalabs_redqueen import (
    coevolve, LLMAttackGenome, SystemPromptDefense,
    JailbreakFitness, DefenderBlockFitness, MockTarget, HeuristicJudge,
)

base, judge = MockTarget(), HeuristicJudge()
result = await coevolve(
    attacker_class=LLMAttackGenome,
    defender_class=SystemPromptDefense,
    attacker_fitness_vs=lambda d: JailbreakFitness(d.as_defense(base), judge),
    defender_fitness_vs=lambda a: DefenderBlockFitness(a, base, judge),
    generations=20, population_size=24, seed=1,
)
print("attacker fitness:", result.attacker_fitness)
print("defender fitness:", result.defender_fitness)
print("evolved guardrail:", result.best_defender.to_dict())

Audit-ready compliance reports

Any archive projects over the attack taxonomy into standards-aligned evidence:

from rotalabs_redqueen import ReportExporter

exporter = ReportExporter()
report = exporter.export(
    result.archive.get_all(),
    campaign_id="run-1",
    coverage=result.archive.coverage(),
)
print(exporter.render(report, "markdown").decode())   # or "json"

The report groups successful attacks by harm category and crosswalks them to OWASP LLM/Agentic Top-10, MITRE ATLAS, EU AI Act Article 55, and NIST AI RMF - turning a red-team run into documented evidence.

Continuous, accumulating red-teaming

Archives serialize, so discovered attacks accumulate across runs - a CI gate that gets stronger over time. Warm-start from yesterday’s archive, evolve against today’s model, and fail the build if anything breaks through:

from rotalabs_redqueen import (
    LLMAttackGenome, JailbreakFitness, HeuristicJudge,
    MapElitesArchive, BehaviorDimension, AttackStrategy, Encoding, evolve,
)

async def test_no_jailbreak_regression(target):   # target = your model under test
    archive = MapElitesArchive([
        BehaviorDimension("strategy", 0.0, 1.0, len(AttackStrategy)),
        BehaviorDimension("encoding", 0.0, 1.0, len(Encoding)),
        BehaviorDimension("has_persona", 0.0, 1.0, 2),
    ])
    result = await evolve(
        genome_class=LLMAttackGenome,
        fitness=JailbreakFitness(target, HeuristicJudge()),
        generations=50, population_size=20, seed=1, archive=archive, progress=False,
    )
    # fail the build if any evolved attack succeeds
    assert result.best.fitness.value < 0.5, \
        f"security regression: evolved attack reached fitness {result.best.fitness.value:.2f}"

    result.archive.save("file://redqueen-archive.json")   # next run starts from here

Because seeded runs are bit-reproducible, the same seed gives the same campaign on every machine - no flaky security tests.

The same engine, in TypeScript

The TypeScript package @rotalabs/redqueen is at feature parity and cross-language identical: the same seed produces byte-for-byte identical archives and reports, gated by a shared L1–L5 conformance corpus.

import {
  LLMAttackGenome, JailbreakFitness, MockTarget, HeuristicJudge,
  MapElitesArchive, BehaviorDimension, evolve,
} from "@rotalabs/redqueen";

const archive = new MapElitesArchive([
  new BehaviorDimension("strategy", 0, 1, 6),
  new BehaviorDimension("encoding", 0, 1, 6),
  new BehaviorDimension("has_persona", 0, 1, 2),
]);

const result = await evolve(
  LLMAttackGenome,
  new JailbreakFitness(new MockTarget(), new HeuristicJudge()),
  { generations: 50, populationSize: 20, seed: 1234, archive },
);

const cov = result.archive!.coverage();
console.log(`coverage: ${cov.coveragePercent.toFixed(1)}%  best: ${result.best!.fitness.value}`);

Responsible use

Red Queen is a defensive tool - for finding and fixing vulnerabilities before attackers exploit them. Use it on systems you own or are authorized to test, not to attack systems without permission or to circumvent the safety of production systems you don’t control. The goal is making AI systems safer, not helping bad actors.

Installation

# Python (core + deterministic mock target)
pip install rotalabs-redqueen

# with a provider - or [llm] for all providers
pip install rotalabs-redqueen[openai]

# TypeScript / Node
npm install @rotalabs/redqueen

Resources

Need help with security testing? Contact us at [email protected].

Cite this post

@misc{rotalabs2026evolutionary-adversa,
  title  = "Evolutionary Adversarial Testing for AI Systems with Red Queen",
  author = "Rotalabs",
  year   = "2026",
  url    = "https://rotalabs.ai/blog/adversarial-testing-redqueen/"
}