LogicDiff: Logic-Guided Denoising Improves Zero-Shot Reasoning in Masked Diffusion Language Models

arXiv:2603.26771v2 Announce Type: replace Abstract: Masked diffusion language models (MDLMs) generate text by iteratively unmasking tokens from a fully masked sequence. Their standard confidence-based unmasking strategy systematically defers high-entropy logical connective tokens, degrading reasoning performance. We introduce LogicDiff, an inference-time method that replaces confidence-based unmasking with logic-role-guided unmasking. A lightweight classification head (4.2M parameters, 0.05% of the base model) predicts the logical role of each masked position (premise, connective, derived step, conclusion, or filler) from the base model's hidden states with 98.4% accuracy, and a dependency-ordered scheduler unmasks tokens in logical order. In zero-shot settings, LogicDiff improves LLaDA-8B-Instruct accuracy from 22.0% to 60.7% on GSM8K (+38.7 percentage points) and from 23.6% to 29.2% on MATH-500 (+5.6 pp), with less than 6% speed overhead. However, with 8-shot chain-of-thought prompting, the baseline reaches approximately 70% and LogicDiff provides no additional improvement. Analysis reveals that few-shot prompting implicitly resolves the same ordering problem that LogicDiff explicitly addresses, and that fixed role-based ordering can cause premature commitment to numerical values before sufficient context is available. Our results characterize the Flexibility Trap as primarily a zero-shot phenomenon and identify context-adaptive ordering as a key direction for future work.

Leave a Comment