When Correct Isn’t Usable: Improving Structured Output Reliability in Small Language Models
arXiv:2605.02363v1 Announce Type: cross
Abstract: Deployed language models must produce outputs that are both correct and format-compliant. We study this structured-output reliability gap using two mathematical benchmarks — GSM8K and MATH — as a con…