cs.CV

MUSE: Resolving Manifold Misalignment in Visual Tokenization via Topological Orthogonality

arXiv:2605.05646v1 Announce Type: new
Abstract: Unified visual tokenization faces a fundamental trade-off between high-fidelity pixel reconstruction (spatial equivariance) and semantic abstraction (conceptual invariance). We attribute this conflict to…