cs.CL

Reinforcement Learning for LLM Post-Training: A Survey

arXiv:2407.16216v3 Announce Type: replace
Abstract: Large language models (LLMs) trained via pretraining and supervised fine-tuning (SFT) can still produce harmful and misaligned outputs, or struggle in domains like math and coding. Reinforcement lear…

cs.CL

Reward Modeling from Natural Language Human Feedback

arXiv:2601.07349v3 Announce Type: replace
Abstract: Reinforcement Learning with Verifiable reward (RLVR) on preference data has become the mainstream approach for training Generative Reward Models (GRMs). Typically in pairwise rewarding tasks, GRMs ge…

cs.CL

SCOPE:Planning for Hybrid Querying over Clinical Trial Data

arXiv:2604.25120v2 Announce Type: replace
Abstract: We study clinical trial table reasoning, where answers are not directly stored in visible cells but must be reasoned from semantic understanding through normalization, classification, extraction, or …

Scroll to Top