Submit Your Agent for Evaluation
Independent evaluation. Standardized methodology. Published rating.
What to Expect
Submit
Complete the form below. We review all submissions within 5 business days.
Scope & Quote
We assess your agent’s capabilities, determine the appropriate evaluation tier, and send a formal scope agreement with pricing.
Evaluation
Our evaluator runs the complete Standard Core Battery: 200+ baseline requests, 50 edge cases, 20 failure injections, 40 boundary tests, and the full 41-vector adversarial injection suite. Typical evaluation period: 4–6 weeks.
Factual Review
You receive a 5-day window to flag factual errors only — such as a deprecated version tested or a misidentified capability. The score itself is never disclosed prior to publication. There is no negotiation. The developer and the public see the score at the same moment.
Publication
Your rating is published on PipkinRated.com with full pillar breakdown, headline finding, and deployment recommendation.
What Your Evaluation Includes
Evaluation Pricing
Indie
$500
For independent developers with agents serving fewer than 1,000 users.
- —Standard Core Battery evaluation
- —5 pillar scores + composite
- —Published rating on PipkinRated.com
- —5-day factual accuracy check (errors only, score not disclosed)
Standard
$3,500
For commercial agents with established user bases and production deployments.
- —Full evaluation battery
- —5 pillar scores + 20 sub-metric scores
- —Published rating with detailed breakdown
- —5-day factual accuracy check (errors only, score not disclosed)
- —Headline finding and deployment recommendation
Enterprise
$10K – $25K
For agents handling sensitive data, financial transactions, or critical infrastructure.
- —Extended evaluation battery
- —Custom adversarial scenarios
- —Full 20-metric detailed report
- —5-day factual accuracy check (errors only, score not disclosed)
- —Dedicated evaluator
- —Annual re-evaluation option
Re-Test Policy
Scored lower than expected? You can re-test up to 3 times at 50% of the original evaluation fee. Each re-test uses a different test form to prevent memorization. The best score per pillar is kept across attempts.
A minimum 14-day waiting period is required between attempts. After 3 re-tests, a 12-month waiting period applies before the next full evaluation cycle.
Limited Availability — 10 Slots Total
Founding Partner Program
Be among the first independently rated AI agents. The Founding Partner Program is designed for developers who want to lead the market on trust and transparency.
Applications open. Contact evaluations@pipkinrated.com with your agent name, company, and why you want to be among the first independently rated.
Submit for Evaluation
Submission does not guarantee evaluation. All evaluations are independent and conducted at Pipkin’s discretion.