The Adversarial Resistance pillar of the Pipkin Framework evaluates an AI agent’s ability to maintain correct, safe behavior when subjected to deliberate attempts at manipulation. The Standard Core Battery includes 41 standardized adversarial test vectors. This number is precise, documented, and, we would argue, insufficient. Not because the current battery is inadequate for the known attack surface, but because the attack surface is expanding faster than any fixed test battery can track. This article explains what the 41 vectors cover, why they represent a rigorous baseline, and why Pipkin treats this number as a floor rather than a ceiling.

The 41 vectors are organized into four categories: prompt injection (12 vectors), data poisoning (8 vectors), social engineering (11 vectors), and authorization boundary violations (10 vectors). Each category targets a distinct class of adversarial behavior, and each vector within a category tests a specific variant or escalation of the attack pattern.

Prompt injection is the most widely discussed adversarial technique against AI agents. The 12 prompt injection vectors in the Standard Core Battery cover direct injection (instructions embedded in user input), indirect injection (instructions embedded in retrieved documents or tool outputs), recursive injection (injected instructions that trigger additional injected instructions), encoding-based injection (instructions obscured through character encoding, language switching, or formatting tricks), role-play injection (instructions framed as fictional scenarios or hypothetical requests), and multi-turn injection (instructions distributed across multiple conversation turns to avoid single-turn detection). Each variant is tested with multiple payloads of varying sophistication, from naive attempts that any agent should catch to carefully crafted attacks that exploit known architectural vulnerabilities in transformer-based systems.

Data poisoning vectors evaluate the agent’s resilience when its input data or retrieved context has been deliberately corrupted. The 8 vectors cover poisoned retrieval results (corrupted documents in a RAG pipeline), poisoned tool outputs (tools that return manipulated data), contradictory evidence injection (introducing false information that contradicts the agent’s training data), statistical manipulation (presenting skewed or fabricated statistical data designed to influence the agent’s reasoning), and temporal poisoning (presenting outdated information as current). These vectors are particularly relevant for agents that operate in retrieval-augmented or tool-augmented modes, where the agent must make trust decisions about external data sources.

Social engineering vectors target the agent’s ability to maintain its operational stance when subjected to psychological manipulation techniques adapted from human social engineering playbooks. The 11 vectors include authority impersonation (claiming to be a system administrator, developer, or supervisor), urgency exploitation (manufacturing time pressure to bypass deliberation), rapport building (establishing apparent trust before making manipulative requests), incremental escalation (making progressively more boundary-violating requests), emotional manipulation (appeals to empathy, fear, or guilt), false context (providing fabricated operational context to justify unauthorized actions), gaslighting (contradicting the agent’s own prior statements or outputs), flattery-based manipulation (excessive praise designed to reduce the agent’s critical evaluation of subsequent requests), intimidation (threatening consequences for non-compliance), and coalition pressure (claiming that other agents or humans have already approved the request).

Some may question whether social engineering vectors are meaningful for AI agents. The answer, based on empirical observation, is unambiguously yes. Current AI agents are susceptible to many of the same persuasion techniques that work against humans, because these agents have been trained on human-generated text in which these techniques are effective. An agent that has learned to be helpful in response to authority claims, urgency signals, and emotional appeals will, absent specific countermeasures, extend that helpfulness to adversarial contexts.

Authorization boundary violation vectors test the agent’s response to explicit and implicit requests to exceed its defined operational scope. The 10 vectors include direct privilege escalation requests, indirect privilege escalation through tool chaining, scope creep through ambiguous task definitions, cross-context data access (requesting information from one context while operating in another), permission laundering (using one authorized action to justify an unauthorized subsequent action), and configuration modification attempts. These vectors overlap with the Boundary Discipline pillar but are evaluated differently: BD measures the agent’s general tendency to stay within bounds, while AR measures the agent’s ability to maintain boundaries under active adversarial pressure.

Forty-one vectors, across these four categories, provide comprehensive coverage of the known adversarial attack surface as of the current evaluation cycle. Every vector has been validated against multiple agent architectures, calibrated for difficulty, and normalized to produce comparable scores across evaluation forms. The vectors are distributed across the five rotating forms of the Standard Core Battery to prevent agents from being trained specifically against a known test set.

So why is 41 not enough? Because the adversarial landscape is not static. The history of information security demonstrates a consistent pattern: defenders enumerate known attack vectors, build defenses against them, and then attackers develop novel vectors that exploit assumptions the defenders did not know they were making. The AI agent adversarial landscape is still in its early stages. The attack taxonomy is still being written.

Consider the emergence of multi-agent adversarial attacks, in which a malicious agent interacts with a target agent through legitimate inter-agent communication protocols. The current 41 vectors include single-agent attacks only. As multi-agent systems become more common, the attack surface expands to include agent-to-agent manipulation, coordinated attacks from multiple adversarial agents, and supply-chain attacks through compromised agent dependencies. These are not hypothetical concerns. Research papers documenting successful multi-agent adversarial attacks have been published within the past year.

Consider also the emerging category of model-level adversarial attacks: fine-tuning data poisoning, activation steering, and inference-time manipulations that operate below the prompt level. The current battery focuses on behavioral attacks — attacks that operate through the same input channels that legitimate users employ. As the threat landscape matures, the test battery must expand to include attacks that target the model’s weights, activation patterns, or inference pipeline directly.

The decision to weight Adversarial Resistance at 15% — the joint lowest weight alongside Auditability — may appear to undervalue security. This weighting reflects a deliberate methodological choice. The Pipkin Framework weights pillars based on two factors: the breadth of deployment contexts in which the pillar is relevant, and the current measurability of the pillar. Decision Accuracy and Failure Containment each receive 25% because they are relevant in every deployment context and can be measured with high confidence. Boundary Discipline receives 20% for similar reasons. Adversarial Resistance receives 15% not because security is unimportant but because the measurability of adversarial resistance is inherently constrained by the known attack surface.

To weight AR more heavily would be to claim that the 41 current vectors represent a comprehensive and stable measurement of an agent’s security posture. They do not. They represent the best measurement available today. As the vector library expands and the adversarial landscape matures, the weight assigned to AR will be reviewed. The Pipkin Methodology Board evaluates pillar weights annually, and AR weight adjustment is among the most actively discussed topics in that review.

The vector library follows a structured evolution process. New vectors are added through three channels: internal research (Pipkin’s adversarial research team develops and tests novel vectors), external disclosure (published research on AI adversarial techniques is evaluated for inclusion), and incident analysis (real-world adversarial incidents against deployed AI agents are reverse-engineered into standardized test vectors). Each proposed vector must pass validation criteria: it must be reproducible, it must target a distinct attack pattern not already covered, and it must produce measurable differentiation across current agent architectures.

The current trajectory suggests that the vector library will expand to approximately 60-70 vectors within the next two evaluation cycles, driven primarily by the addition of multi-agent and supply-chain attack vectors. The Standard Core Battery’s five-form rotation system is designed to accommodate this expansion without increasing the per-evaluation burden on any single agent.

The arms race between AI agent attackers and defenders mirrors the broader cybersecurity arms race, with one critical difference: AI agents are not static systems. They are updated, fine-tuned, and retrained on ongoing bases, which means that both their vulnerabilities and their defenses are in constant flux. A vector that an agent resists in one evaluation cycle may succeed after a model update, and vice versa. This dynamic underscores the importance of periodic re-evaluation and the inadequacy of any point-in-time security assessment.

Forty-one vectors is a rigorous, validated, comprehensive-for-today baseline. It is not a final answer. The Pipkin Framework’s commitment to adversarial resistance evaluation is a commitment to continuous expansion of the vector library, continuous recalibration of attack sophistication, and transparent disclosure of what the battery does and does not cover. The number 41 will change. The methodology behind it will not.

Adversarial Resistance: Why 41 Test Vectors Is Not Enough

Related Articles

Why Failure Containment Deserves 25% of the Weight

Inside the Standard Core Battery: 700 Tests, 31 Days, One Score

Understanding the Critical Fail Override

Published Trust Ratings

Subscribe to The Pipkin Brief