Implements evaluation-driven development principles for Claude Code sessions with automated testing frameworks.
/plugin install evaluation-harness-framework@affaan-mRequires Claude Code CLI.
AI engineers and developers use this to define pass/fail criteria before implementation, measure agent reliability via pass@k metrics, and catch regressions across model versions.
No reviews yet. Be the first to review this skill.
Affaan M
@affaan-m