CARO: Chain-of-Analogy Reasoning Optimization for Robust Content Moderation
arXiv:2604.10504v1 Announce Type: new
Abstract: Current large language models (LLMs), even those explicitly trained for reasoning, often struggle with ambiguous content moderation cases due to misleading “decision shortcuts” embedded in context. Inspi…