cs.AI, cs.CL, cs.LG

Maximizing mutual information between prompts and responses improve LLM personalization with no additional data or human oversight

arXiv:2603.19294v2 Announce Type: replace-cross
Abstract: While post-training has successfully improved large language models (LLMs) across a variety of domains, these gains heavily rely on human-labeled data or external verifiers. Existing data has a…