cs.CV, cs.LG

Featurising Pixels from Dynamic 3D Scenes with Linear In-Context Learners

arXiv:2604.26488v1 Announce Type: new
Abstract: One of the most exciting applications of vision models involve pixel-level reasoning. Despite the abundance of vision foundation models, we still lack representations that effectively embed spatio-tempor…