cs.CV

RePack then Refine: Efficient Diffusion Transformer with Vision Foundation Model

arXiv:2512.12083v3 Announce Type: replace
Abstract: Semantic-rich features from Vision Foundation Models (VFMs) have been leveraged to enhance Latent Diffusion Models (LDMs). However, raw VFM features are typically high-dimensional and redundant, incr…