Niccol\`o Cavagnero, Narges Norouzi, Gijs Dubbelman, Daan de Geus

PMT: Plain Mask Transformer for Image and Video Segmentation with Frozen Vision Encoders

Niccol\`o Cavagnero, Narges Norouzi, Gijs Dubbelman, Daan de Geus / March 27, 2026

arXiv:2603.25398v1 Announce Type: new
Abstract: Vision Foundation Models (VFMs) pre-trained at scale enable a single frozen encoder to serve multiple downstream tasks simultaneously. Recent VFM-based encoder-only models for image and video segmentatio…

Author name: Niccol\`o Cavagnero, Narges Norouzi, Gijs Dubbelman, Daan de Geus

PMT: Plain Mask Transformer for Image and Video Segmentation with Frozen Vision Encoders