GVCC: Zero-Shot Video Compression via Codebook-Driven Stochastic Rectified Flow

arXiv:2603.26571v3 Announce Type: replace Abstract: At ultra-low bitrates, high-fidelity reconstruction requires sampling plausible videos from the posterior rather than regressing to oversmoothed conditional means. We propose Generative Video Codebook Codec (GVCC), a zero-shot framework in which a pretrained video generative model serves directly as the decoder, and the transmitted bitstream specifies its generation trajectory. Modern rectified-flow video models are typically sampled with deterministic ODE solvers, which leave no per-step stochastic channel for transmitting compressed information. GVCC addresses this by converting the deterministic flow sampler into an equivalent marginal-preserving stochastic process, so that information can be transmitted by encoding the per-step stochastic innovations. Unlike images, videos introduce longer temporal dependencies and more diverse conditioning modes. We instantiate GVCC in three practical modes: Text-to-Video (T2V) without a reference frame, autoregressive Image-to-Video (I2V) with tail latent correction, and First-Last-Frame-to-Video (FLF2V) with boundary-sharing Group of Pictures (GOP) chaining. On UVG, GVCC achieves the lowest LPIPS among evaluated baselines across three representative bitrate regimes (down to ${\sim}$0.003\,bpp), with 65\% LPIPS reduction over DCVC-RT at matched bitrate.

Leave a Comment