Evaluate and rank agent results using metrics, LLM judge comparison, or hybrid approach for AgentHub sessions.
/plugin install evaluate-agent-results-by-metric@alirezarezvaniRequires Claude Code CLI.
AI development teams use this to automatically benchmark and rank competing agent solutions by performance metrics or qualitative assessment.
No reviews yet. Be the first to review this skill.