cs.AI, cs.CL, cs.HC, cs.IR, cs.LG

Personalized Benchmarking: Evaluating LLMs by Individual Preferences

arXiv:2604.18943v1 Announce Type: cross
Abstract: With the rise in capabilities of large language models (LLMs) and their deployment in real-world tasks, evaluating LLM alignment with human preferences has become an important challenge. Current benchm…