cs.AI, cs.LG

Distributionally Robust Token Optimization in RLHF

arXiv:2604.08577v1 Announce Type: new
Abstract: Large Language Models (LLMs) tend to respond correctly to prompts that align to the data they were trained and fine-tuned on. Yet, small shifts in wording, format, or language can trigger surprisingly la…