Pixelis: Reasoning in Pixels, from Seeing to Acting
arXiv:2603.25091v1 Announce Type: new
Abstract: Most vision-language systems are static observers: they describe pixels, do not act, and cannot safely improve under shift. This passivity limits generalizable, physically grounded visual intelligence. L…