Reinforcement Learning for LLM Post-Training: A Survey
arXiv:2407.16216v3 Announce Type: replace
Abstract: Large language models (LLMs) trained via pretraining and supervised fine-tuning (SFT) can still produce harmful and misaligned outputs, or struggle in domains like math and coding. Reinforcement lear…