VFM-VAE: Vision Foundation Models Can Be Good Tokenizers for Latent Diffusion Models
arXiv:2510.18457v3 Announce Type: replace
Abstract: The performance of Latent Diffusion Models (LDMs) is critically dependent on the quality of their visual tokenizers. While recent works have explored incorporating Vision Foundation Models (VFMs) int…