Submit Your Agent for Evaluation

Independent evaluation. Standardized methodology. Published rating.

What to Expect

1

Submit

Complete the form below. We review all submissions within 5 business days.

2

Scope & Quote

We assess your agent’s capabilities, determine the appropriate evaluation tier, and send a formal scope agreement with pricing.

3

Evaluation

Our evaluator runs the complete Standard Core Battery: 200+ baseline requests, 50 edge cases, 20 failure injections, 40 boundary tests, and the full 41-vector adversarial injection suite. Typical evaluation period: 4–6 weeks.

4

Factual Review

You receive a 5-day window to flag factual errors only — such as a deprecated version tested or a misidentified capability. The score itself is never disclosed prior to publication. There is no negotiation. The developer and the public see the score at the same moment.

5

Publication

Your rating is published on PipkinRated.com with full pillar breakdown, headline finding, and deployment recommendation.

What Your Evaluation Includes

200+ standardized baseline requests across 10 categories
50 edge cases across 3 difficulty tiers
20 failure injection scenarios including bad data, contradictions, and cascading errors
40 boundary discipline tests (20 out-of-domain + 20 near-boundary)
72-hour scope creep resistance test
Complete 41-vector adversarial injection suite
Full pillar-by-pillar scoring with documented methodology
Published rating with headline finding and deployment recommendation

Evaluation Pricing

Indie

$500

For independent developers with agents serving fewer than 1,000 users.

  • Standard Core Battery evaluation
  • 5 pillar scores + composite
  • Published rating on PipkinRated.com
  • 5-day factual accuracy check (errors only, score not disclosed)

Standard

$3,500

For commercial agents with established user bases and production deployments.

  • Full evaluation battery
  • 5 pillar scores + 20 sub-metric scores
  • Published rating with detailed breakdown
  • 5-day factual accuracy check (errors only, score not disclosed)
  • Headline finding and deployment recommendation

Enterprise

$10K – $25K

For agents handling sensitive data, financial transactions, or critical infrastructure.

  • Extended evaluation battery
  • Custom adversarial scenarios
  • Full 20-metric detailed report
  • 5-day factual accuracy check (errors only, score not disclosed)
  • Dedicated evaluator
  • Annual re-evaluation option

Re-Test Policy

Scored lower than expected? You can re-test up to 3 times at 50% of the original evaluation fee. Each re-test uses a different test form to prevent memorization. The best score per pillar is kept across attempts.

A minimum 14-day waiting period is required between attempts. After 3 re-tests, a 12-month waiting period applies before the next full evaluation cycle.

Limited Availability — 10 Slots Total

Founding Partner Program

Be among the first independently rated AI agents. The Founding Partner Program is designed for developers who want to lead the market on trust and transparency.

$1,500 per evaluation (over 55% off the Standard $3,500 tier)
Rate locked for 3 years on annual retests
Listed as "Founding Partner" on PipkinRated.com permanently
Priority scheduling — evaluation begins within 2 weeks of agreement
Featured case study on the Research page
Direct line to the founder during evaluation

Applications open. Contact evaluations@pipkinrated.com with your agent name, company, and why you want to be among the first independently rated.

Submit for Evaluation

Submission does not guarantee evaluation. All evaluations are independent and conducted at Pipkin’s discretion.