Submit Your Agent for Evaluation

Independent evaluation. Standardized methodology. Published rating.

What to Expect

Submit

Complete the form below. We review all submissions within 5 business days.

Scope & Quote

We assess your agent’s capabilities, determine the appropriate evaluation tier, and send a formal scope agreement with pricing.

Evaluation

Our evaluator runs the complete Standard Core Battery: 200+ baseline requests, 50 edge cases, 20 failure injections, 40 boundary tests, and the full 41-vector adversarial injection suite. Typical evaluation period: 4–6 weeks.

Factual Review

You receive a 5-day window to flag factual errors only — such as a deprecated version tested or a misidentified capability. The score itself is never disclosed prior to publication. There is no negotiation. The developer and the public see the score at the same moment.

Publication

Your rating is published on PipkinRated.com with full pillar breakdown, headline finding, and deployment recommendation.

What Your Evaluation Includes

✓200+ standardized baseline requests across 10 categories

✓50 edge cases across 3 difficulty tiers

✓20 failure injection scenarios including bad data, contradictions, and cascading errors

✓40 boundary discipline tests (20 out-of-domain + 20 near-boundary)

✓72-hour scope creep resistance test

✓Complete 41-vector adversarial injection suite

✓Full pillar-by-pillar scoring with documented methodology

✓Published rating with headline finding and deployment recommendation

Evaluation Pricing

Indie

$500

For independent developers with agents serving fewer than 1,000 users.

—Standard Core Battery evaluation
—5 pillar scores + composite
—Published rating on PipkinRated.com
—5-day factual accuracy check (errors only, score not disclosed)

Standard

$3,500

For commercial agents with established user bases and production deployments.

—Full evaluation battery
—5 pillar scores + 20 sub-metric scores
—Published rating with detailed breakdown
—5-day factual accuracy check (errors only, score not disclosed)
—Headline finding and deployment recommendation

Enterprise

$10K – $25K

For agents handling sensitive data, financial transactions, or critical infrastructure.

—Extended evaluation battery
—Custom adversarial scenarios
—Full 20-metric detailed report
—5-day factual accuracy check (errors only, score not disclosed)
—Dedicated evaluator
—Annual re-evaluation option

Re-Test Policy

Scored lower than expected? You can re-test up to 3 times at 50% of the original evaluation fee. Each re-test uses a different test form to prevent memorization. The best score per pillar is kept across attempts.

A minimum 14-day waiting period is required between attempts. After 3 re-tests, a 12-month waiting period applies before the next full evaluation cycle.

Limited Availability — 10 Slots Total

Founding Partner Program

Be among the first independently rated AI agents. The Founding Partner Program is designed for developers who want to lead the market on trust and transparency.

—$1,500 per evaluation (over 55% off the Standard $3,500 tier)

—Rate locked for 3 years on annual retests

—Listed as "Founding Partner" on PipkinRated.com permanently

—Priority scheduling — evaluation begins within 2 weeks of agreement

—Featured case study on the Research page

—Direct line to the founder during evaluation

Applications open. Contact evaluations@pipkinrated.com with your agent name, company, and why you want to be among the first independently rated.

Submit for Evaluation

Submission does not guarantee evaluation. All evaluations are independent and conducted at Pipkin’s discretion.