Evaluation Loop

An evaluation loop is the feedback cycle in which an Agent or pipeline's output is scored against predefined criteria and the result is fed back into the next iteration. In practice it has two layers: offline Eval suites for pre-release quality gates, and online evaluation for catching regressions in production. Modern agent stacks frequently use LLM-as-Judge or Self-Critique as the evaluator. Anthropic's "Building Effective Agents" post insists you don't ship without one — a non-trivial agent system's real behavior is only understood by measuring it.