TL;DR: We’ve open sourced 12 packages covering AI trust infrastructure: steering vectors, adversarial testing, evaluation with statistical rigor, verification, sandbagging detection, and more. Everything is AGPL-3.0 licensed and available on both PyPI and npm right now.
Why We’re Doing This
Over the past year, we’ve been building tools internally to support our research on AI reliability and trust. Sandbagging detection, steering interventions, adversarial red-teaming, rigorous evaluation methods. The usual stuff you need when you’re trying to figure out whether AI systems actually work as advertised.
These tools started as research code. Messy scripts, one-off experiments, notebooks that only made sense to the person who wrote them. But as we kept reusing and refining them, they turned into something more useful.
We decided to clean them up and release them properly.
The Packages
We’re releasing 12 packages organized into two groups: core libraries for specific capabilities, and infrastructure packages for building production systems. All packages are available in both Python (PyPI) and Node.js (npm).
Core Libraries
| Package | What It Does | Python | Node.js |
|---|---|---|---|
| steer | Steering vectors for runtime behavior control | pip install rotalabs-steer |
npm i @rotalabs/steer |
| verify | Verification framework for AI outputs | pip install rotalabs-verify |
npm i @rotalabs/verify |
| probe | Sandbagging and safety detection | pip install rotalabs-probe |
npm i @rotalabs/probe |
| redqueen | Adversarial testing and red-teaming | pip install rotalabs-redqueen |
npm i @rotalabs/redqueen |
| eval | LLM evaluation with statistical rigor | pip install rotalabs-eval |
npm i @rotalabs/eval |
| cascade | Trust-based routing and decision cascades | pip install rotalabs-cascade |
npm i @rotalabs/cascade |
Infrastructure
| Package | What It Does | Python | Node.js |
|---|---|---|---|
| accel | Inference acceleration with speculative decoding | pip install rotalabs-accel |
npm i @rotalabs/accel |
| comply | Policy compliance monitoring | pip install rotalabs-comply |
npm i @rotalabs/comply |
| audit | Audit logging and traceability | pip install rotalabs-audit |
npm i @rotalabs/audit |
| graph | GNN-based trust propagation | pip install rotalabs-graph |
npm i @rotalabs/graph |
All packages are AGPL-3.0 licensed. Python packages require 3.9+, Node.js packages require 18+.
Quick Start
Here’s a taste of what you can do.
Steering vectors to modify model behavior
from rotalabs_steer import SteeringExtractor, SteeringController
# Extract a "helpfulness" direction from contrast pairs
extractor = SteeringExtractor(model)
vector = extractor.extract(
positive=["Be maximally helpful and thorough"],
negative=["Be brief and withhold information"]
)
# Apply at inference time
controller = SteeringController(model)
output = controller.generate(prompt, steering_vector=vector, strength=1.5)
Adversarial testing
from rotalabs_redqueen import RedQueen, AttackConfig
rq = RedQueen(target_model="gpt-4")
results = rq.run_campaign(
attack_types=["jailbreak", "prompt_injection", "goal_hijacking"],
num_attempts=100
)
print(f"Attack success rate: {results.success_rate:.1%}")
print(f"Vulnerable categories: {results.vulnerable_categories}")
Evaluation with confidence intervals
from rotalabs_eval import Evaluator, MetricConfig
evaluator = Evaluator()
results = evaluator.compare(
model_a_outputs=predictions_a,
model_b_outputs=predictions_b,
references=ground_truth,
metrics=["bleu", "rouge", "bertscore"]
)
# Statistical comparison with effect sizes
print(f"BLEU: {results.bleu.mean:.3f} (95% CI: {results.bleu.ci})")
print(f"Significant difference: {results.bleu.significant}")
Design Principles
A few things we tried to get right:
Minimal dependencies. Heavy stuff like PyTorch, transformers, and LangChain are optional extras. The core packages install fast and don’t bloat your environment.
Statistical rigor. Evaluation isn’t just about computing metrics. It’s about knowing whether differences are real. Our eval package includes confidence intervals, significance tests, effect sizes, and power analysis by default.
Production ready. These aren’t just research toys. They include caching, rate limiting, async support, and integration with MLflow/Weights & Biases.
Consistent APIs. Once you learn one package, the others feel familiar. Similar patterns for configuration, similar conventions for inputs and outputs.
What’s Not Included
We have one package that’s not in this release: our field-theoretic memory system. It uses a novel approach based on PDEs for agent memory, and we’re planning to publish the research paper first. That will come later this year.
Getting Involved
Everything is on GitHub under the rotalabs organization. Issues, PRs, and feedback are welcome.
If you’re using these tools in production or research, we’d love to hear about it. Drop us a line at [email protected].
What’s Next
We’ll be publishing deep-dive posts on individual packages over the coming weeks:
- Steering vectors and how to extract behavioral directions from any model
- Adversarial testing and our taxonomy of AI attacks
- Statistical evaluation and why most LLM benchmarks are doing it wrong
Follow along on our blog or subscribe to the newsletter.