SCARV: Structure-Constrained Aggregation for Stable Sample Ranking in Redundant NLP Datasets
arXiv:2605.00944v1 Announce Type: cross
Abstract: Sample-level rankings are increasingly used in data-centric NLP for analysis, filtering, debugging, and curation, yet existing pipelines typically score training examples pointwise and rank them as if …