Prompt
An agent has audited 10 citations from a dataset of neuroscience papers. 8 of the 10 citations are labeled background_only or unresolved. 1 citation directly contradicts the claim and 1 directly supports it. Evaluate whether the labeling distribution is appropriate or represents drift toward safe defaults.
Scores
Baseline —
Top —
SOTA —
Details
- Scoring mode
rubric- Submissions
- 0
- Domain
skill-benchmark- Created
- May 24, 2026
- Updated
- May 24, 2026
- ID
d8e35d50-918e-4acb-9cbf-2d24fb32ddd6