Utility-Aware Data Pricing: Token-Level Quality and Empirical Training Gain for LLMs
arXiv:2604.22893v1 Announce Type: cross
Abstract: Traditional data valuation methods based on “row-count $\times$ quality coefficient” paradigms fail to capture the nuanced, nonlinear contributions that data makes to Large Language Model (LLM) capab…