Haocheng Lu, Minjun Zhu, Henry Yu

Hard Negative Sample-Augmented DPO Post-Training for Small Language Models

Haocheng Lu, Minjun Zhu, Henry Yu / April 15, 2026

arXiv:2512.19728v2 Announce Type: replace
Abstract: Large language models (LLMs) continue to struggle with mathematical reasoning, and common post-training pipelines often reduce each generated solution to a binary outcome: correct or incorrect. This …

Author name: Haocheng Lu, Minjun Zhu, Henry Yu

Hard Negative Sample-Augmented DPO Post-Training for Small Language Models