Beyond Black-Box Labels: Interpretable Criteria for Diagnosing SubjectiveNLP Tasks
arXiv:2604.17022v1 Announce Type: new
Abstract: Subjective NLP datasets typically aggregate annotator judgments into a single gold label, making it difficult to diagnose whether disagreement reflects unclear criteria, collapsed distinctions, or legiti…