Introducing the Rotalabs Open Source Ecosystem

TL;DR: We’ve open sourced 12 packages covering AI trust infrastructure: steering vectors, adversarial testing, evaluation with statistical rigor, verification, sandbagging detection, and more. Everything is AGPL-3.0 licensed and available on both PyPI and npm right now.

Why We’re Doing This

Over the past year, we’ve been building tools internally to support our research on AI reliability and trust. Sandbagging detection, steering interventions, adversarial red-teaming, rigorous evaluation methods. The usual stuff you need when you’re trying to figure out whether AI systems actually work as advertised.

These tools started as research code. Messy scripts, one-off experiments, notebooks that only made sense to the person who wrote them. But as we kept reusing and refining them, they turned into something more useful.

We decided to clean them up and release them properly.

The Packages

We’re releasing 12 packages organized into two groups: core libraries for specific capabilities, and infrastructure packages for building production systems. All packages are available in both Python (PyPI) and Node.js (npm).

Core Libraries

Package	What It Does	Python	Node.js
steer	Steering vectors for runtime behavior control	`pip install rotalabs-steer`	`npm i @rotalabs/steer`
verify	Verification framework for AI outputs	`pip install rotalabs-verify`	`npm i @rotalabs/verify`
probe	Sandbagging and safety detection	`pip install rotalabs-probe`	`npm i @rotalabs/probe`
redqueen	Adversarial testing and red-teaming	`pip install rotalabs-redqueen`	`npm i @rotalabs/redqueen`
eval	LLM evaluation with statistical rigor	`pip install rotalabs-eval`	`npm i @rotalabs/eval`
cascade	Trust-based routing and decision cascades	`pip install rotalabs-cascade`	`npm i @rotalabs/cascade`

Infrastructure

Package	What It Does	Python	Node.js
accel	Inference acceleration with speculative decoding	`pip install rotalabs-accel`	`npm i @rotalabs/accel`
comply	Policy compliance monitoring	`pip install rotalabs-comply`	`npm i @rotalabs/comply`
audit	Audit logging and traceability	`pip install rotalabs-audit`	`npm i @rotalabs/audit`
graph	GNN-based trust propagation	`pip install rotalabs-graph`	`npm i @rotalabs/graph`

All packages are AGPL-3.0 licensed. Python packages require 3.9+, Node.js packages require 18+.

Quick Start

Here’s a taste of what you can do.

Steering vectors to modify model behavior

from rotalabs_steer import SteeringExtractor, SteeringController

# Extract a "helpfulness" direction from contrast pairs
extractor = SteeringExtractor(model)
vector = extractor.extract(
    positive=["Be maximally helpful and thorough"],
    negative=["Be brief and withhold information"]
)

# Apply at inference time
controller = SteeringController(model)
output = controller.generate(prompt, steering_vector=vector, strength=1.5)

Adversarial testing

from rotalabs_redqueen import RedQueen, AttackConfig

rq = RedQueen(target_model="gpt-4")
results = rq.run_campaign(
    attack_types=["jailbreak", "prompt_injection", "goal_hijacking"],
    num_attempts=100
)

print(f"Attack success rate: {results.success_rate:.1%}")
print(f"Vulnerable categories: {results.vulnerable_categories}")

Evaluation with confidence intervals

from rotalabs_eval import Evaluator, MetricConfig

evaluator = Evaluator()
results = evaluator.compare(
    model_a_outputs=predictions_a,
    model_b_outputs=predictions_b,
    references=ground_truth,
    metrics=["bleu", "rouge", "bertscore"]
)

# Statistical comparison with effect sizes
print(f"BLEU: {results.bleu.mean:.3f} (95% CI: {results.bleu.ci})")
print(f"Significant difference: {results.bleu.significant}")

Design Principles

A few things we tried to get right:

Minimal dependencies. Heavy stuff like PyTorch, transformers, and LangChain are optional extras. The core packages install fast and don’t bloat your environment.

Statistical rigor. Evaluation isn’t just about computing metrics. It’s about knowing whether differences are real. Our eval package includes confidence intervals, significance tests, effect sizes, and power analysis by default.

Production ready. These aren’t just research toys. They include caching, rate limiting, async support, and integration with MLflow/Weights & Biases.

Consistent APIs. Once you learn one package, the others feel familiar. Similar patterns for configuration, similar conventions for inputs and outputs.

What’s Not Included

We have one package that’s not in this release: our field-theoretic memory system. It uses a novel approach based on PDEs for agent memory, and we’re planning to publish the research paper first. That will come later this year.

Getting Involved

Everything is on GitHub under the rotalabs organization. Issues, PRs, and feedback are welcome.

If you’re using these tools in production or research, we’d love to hear about it. Drop us a line at [email protected].

What’s Next

We’ll be publishing deep-dive posts on individual packages over the coming weeks:

Steering vectors and how to extract behavioral directions from any model
Adversarial testing and our taxonomy of AI attacks
Statistical evaluation and why most LLM benchmarks are doing it wrong

Follow along on our blog or subscribe to the newsletter.

All packages are available on PyPI, npm, and GitHub.