cs.AI, cs.CL

What’s In My Human Feedback? Learning Interpretable Descriptions of Preference Data

arXiv:2510.26202v2 Announce Type: replace
Abstract: Human feedback can alter language models in unpredictable and undesirable ways, as practitioners lack a clear understanding of what feedback data encodes. While prior work studies preferences over ce…