What-Where Transformer: A Slot-Centric Visual Backbone for Concurrent Representation and Localization
arXiv:2605.12021v1 Announce Type: new
Abstract: Many image understanding tasks involve identifying what is present and where it appears. However, tasks that address where, such as object discovery, detection, and segmentation, are often considerably m…