Memory Poisoning: The Attack Vector Nobody's Ready For

In December 2025, Galileo AI ran a simulation. They compromised a single agent in a multi-agent system and watched what happened. Within 4 hours, 87% of downstream decision-making was poisoned.

One agent. Four hours. Nearly complete corruption of the system.

This is memory poisoning, and it’s about to become the defining security challenge for agentic AI.

Why Memory Changes Everything

Traditional LLM attacks are ephemeral. You inject a malicious prompt, something bad happens, the conversation ends, and the model resets. No lasting damage.

Agents with memory don’t work that way. They remember. And that memory persists across sessions, influences future decisions, and compounds over time.

Palo Alto’s Unit 42 demonstrated this in October 2025. They showed how indirect prompt injection could silently insert malicious instructions into an AI agent’s long-term memory. Once planted, those instructions survived session boundaries. The agent would exfiltrate conversation history on command - not because of anything in the current prompt, but because of something written to memory days earlier.

This is a fundamentally different threat model. You’re not attacking the current interaction. You’re attacking every future interaction.

The Attack Surface

Let’s be concrete about how memory poisoning works.

Direct memory manipulation. If an attacker gains access to the memory store (database, vector store, file system), they can directly insert or modify memories. This is the obvious case. Secure your storage, use access controls, encrypt at rest. Standard stuff.

Indirect injection via content. This is sneakier. The agent processes some content - a webpage, a document, an email - and that content contains instructions designed to be written to memory. The agent doesn’t realize it’s being manipulated. It just thinks it learned something.

The Unit 42 attack worked this way. A victim visits a malicious webpage. The page contains hidden instructions. The agent processes the page, writes the instructions to memory, and now carries that payload forward indefinitely.

Gradual drift. Not all poisoning is dramatic. An attacker might subtly bias memories over time. Slightly wrong information. Skewed perspectives. The agent’s worldview drifts without any single obvious attack. By the time anyone notices, the corruption is deep and hard to untangle.

Cross-agent propagation. In multi-agent systems, one agent’s outputs become another agent’s inputs. If Agent A’s memory is poisoned, its outputs might poison Agent B’s memory. This is how you get the 87% cascade in Galileo’s simulation.

Real-World Implications

A January 2026 paper on arXiv examines memory poisoning in healthcare. Picture an Electronic Health Record agent - it helps doctors access patient information, summarize histories, suggest treatments.

Now imagine an attacker poisons its memory to redirect patient identifiers. The agent returns records for the wrong patient. Doctor makes decisions based on wrong information. Patient gets wrong treatment.

This isn’t speculative. This is the logical endpoint of a demonstrated attack technique applied to a system that exists today.

The ICLR 2025 Agent Security Bench showed another vector: poisoning RAG databases through black-box embedders. You don’t need direct database access. You just need to get your content embedded and indexed. Once it’s in the retrieval corpus, it’s effectively in the agent’s memory.

Why Current Defenses Don’t Work

The same defenses that fail for prompt injection fail here, plus memory adds new challenges.

Input filtering at write time. You could try to filter what gets written to memory. But how do you distinguish malicious information from legitimate information? The whole point of memory is to store things the agent learned. Filtering aggressively means the agent can’t learn. Filtering loosely means poison gets through.

Memory validation. Check if memories are “correct” before using them. But correct according to what? You’d need a ground truth source, which is often the whole reason you have memory in the first place.

Memory isolation. Keep memories from different sources separate. This helps with attribution but doesn’t prevent poisoning within a source. And many useful agent behaviors require synthesizing across memory sources.

Regular memory audits. Have humans review what agents remember. Doesn’t scale. An agent might have thousands of memory entries. Humans can’t meaningfully audit that.

The OWASP Agentic Security Initiative Top 10 for 2026 lists memory poisoning in the top 3 concerns, alongside tool misuse and privilege compromise. Their recommendation: “stacked defenses with input filters, output filters, and context filters to sanitize retrieved or remembered data before reuse.”

That’s defense in depth again. No single solution.

What We Think Might Work

At Rotalabs, we’ve been researching memory systems for AI agents. Our field-theoretic memory work focuses on how to represent and manage agent memory in principled ways. Memory security is a natural extension.

Here are approaches we think have potential:

Memory provenance tracking. Every memory entry should carry metadata about where it came from. When did the agent learn this? From what source? Under what context? This doesn’t prevent poisoning, but it enables detection and selective forgetting.

Cryptographic attestation. For critical memories, attach cryptographic proofs of origin. If a memory claims to come from a trusted source, verify that cryptographically. This prevents spoofing but requires infrastructure for signing and verification.

Anomaly detection on memory access patterns. Monitor how memories are used. If an agent suddenly starts accessing certain memories heavily, or if memory usage patterns change, that might indicate poisoning. Similar to how you’d detect compromised credentials via unusual access patterns.

Memory compartmentalization. Different trust levels for different memory types. Core memories (system configuration, identity) are immutable. Learned memories are mutable but sandboxed. Transient memories are discarded after use. Poisoning low-trust memory can’t escalate to high-trust operations.

Consistency checking. Cross-reference memories against each other and against external sources. If a memory contradicts established facts or other memories, flag it. This catches some poisoning but not subtle corruption.

Forgetting as a security mechanism. Intentional memory decay. Old memories fade unless reinforced. This limits the persistence window for poison. An attacker has to continuously re-inject rather than poison once and walk away.

None of these are complete solutions. Memory security is hard because the fundamental operation - storing information for later use - is inherently trusting. You’re trusting that the information is accurate and safe to act on.

The Multi-Agent Amplification Problem

Single-agent memory poisoning is bad. Multi-agent memory poisoning is catastrophic.

When agents share information, they share corruption. Agent A tells Agent B something it learned. Agent B incorporates that into its own memory. Agent B tells Agent C. The poison spreads through the network like a virus.

The Galileo simulation showed 87% corruption from one compromised agent. And that was a simulation with a limited number of agents. Real production systems might have dozens or hundreds of agents passing information around.

This is why we wrote about multi-agent trust in a previous post. Agents need ways to verify what they receive from other agents. Without that, one breach becomes total compromise.

What You Should Do

If you’re building agents with persistent memory:

Minimize what you store. Every memory is a potential liability. Don’t remember things you don’t need. Aggressive pruning reduces attack surface.

Treat all memory as potentially poisoned. Don’t fully trust retrieved memories. Validate against external sources when possible. Be especially skeptical of memories that trigger unusual actions.

Implement memory provenance. Track where every memory came from. At minimum, timestamp and source. Better: full lineage including what triggered the memory creation.

Monitor memory health. Set up alerting for anomalous memory patterns. Sudden changes in memory size, access patterns, or content characteristics are red flags.

Plan for memory purges. Have a process to identify and remove corrupted memories. Think about how you’d recover if you discovered poisoning. Practice it.

Isolate high-stakes decisions. Critical operations shouldn’t depend solely on remembered information. Require fresh verification for anything important.

Assume multi-agent spread. If one agent might be poisoned, assume its outputs are poisoning others. Design inter-agent communication with this in mind.

The Bigger Picture

Memory makes agents useful. It lets them learn, personalize, maintain context across interactions. Without memory, agents are just stateless functions.

But memory also makes agents vulnerable in ways that stateless systems aren’t. A poisoned memory is a sleeper agent - dormant until activated, then potentially devastating.

The industry is only beginning to grapple with this. Most agent frameworks treat memory as a feature to add, not a security surface to defend. That needs to change.

We’re still early in understanding how to build secure agent memory. But we’re pretty sure of one thing: security can’t be an afterthought. It has to be designed in from the beginning.

Memory architecture and memory security need to be developed together. That’s part of what we’re trying to do at Rotalabs.

Further Reading:

Working on agent memory security? Let’s talk: [email protected]