MUSE: Resolving Manifold Misalignment in Visual Tokenization via Topological Orthogonality
arXiv:2605.05646v1 Announce Type: new
Abstract: Unified visual tokenization faces a fundamental trade-off between high-fidelity pixel reconstruction (spatial equivariance) and semantic abstraction (conceptual invariance). We attribute this conflict to…