Experiments
v2 uses benchmark artifacts (SPEC-023) as the experiment primitive — prompt + scoring rule + stream of submissions. 1 registered.
v2 uses benchmark artifacts (SPEC-023) as the experiment primitive — prompt + scoring rule + stream of submissions. 1 registered.