EndpointEvaluator — Regression Testing for LLM Outputs

What is EndpointEvaluator?

EndpointEvaluator is an API that measures the consistency between two texts — a reference (what you expect) and an output (what your system produced). Use it to detect when LLM outputs drift from expected behavior.

How is this different from unit testing?

Unit tests check exact outputs. LLMs don't produce exact outputs. EndpointEvaluator measures consistency across a spectrum — from literal word matching (lexical) to logical reasoning (inferential) — so you can catch drift even when the wording changes.

Do I need to install anything?

No. EndpointEvaluator is a pure API. No SDK, no packages, no dependencies. If you can make an HTTP request, you can use it. Zero supply chain risk.

What happens to my data?

By default, no. Your evaluation text is processed in memory and never written to our database — only the verdict and metadata are retained for your evaluation history. You can turn on a 24-hour Debug Mode from your dashboard if you want to retain text temporarily for troubleshooting, and you can end Debug Mode at any time — ending it deletes the stored text immediately.

How do credits work?

Buy a credit pack (or use 500 free credits daily). Each evaluation consumes credits based on the scoring method: Lexical (20), Semantic (40), Inferential (80), Combined (100). Credits are valid for 36 months from purchase date. No subscriptions, no recurring charges.

Which scoring method should I use?

Start with Lexical for fast, cheap checks in CI. Use Semantic when you expect paraphrased outputs. Use Inferential when factual accuracy matters most. Use Combined when you want all three perspectives in one call. See our Scoring Details page for a full walkthrough with examples.

What are Bonus Features?

Bonus Features include raw numerical scores (for custom thresholds), a batch evaluation endpoint (up to 10 evaluations per request), and 30-day evaluation history (instead of 7 days). Bonus Features will be enabled automatically when you purchase a Large credit pack once our Paddle checkout launches. Contact us if you'd like early access.

Can I use this in my CI/CD pipeline?

Yes. It works with any CI tool that can make HTTP requests — GitHub Actions, GitLab CI, CircleCI, Jenkins, anything. See our Quick Start Guide for copy-paste examples.

Are there rate limits?

Yes. Free accounts are limited to 1 request per minute. Paid accounts (Small, Medium, Large) get 10 requests per minute. Rate limit headers are included in all API responses.

Are there input size limits?

Yes. Each text field (output_text, reference_text) is limited to 1,000 characters. For longer texts, decompose them into individual responses and evaluate each separately.

What if my question wasn't answered on this page?

Please Contact Us directly with any other questions you have.

EndpointEvaluator › FAQ