Camera Control for Text-to-Image Generation via Learning Viewpoint Tokens
arXiv:2604.19954v1 Announce Type: new
Abstract: Current text-to-image models struggle to provide precise camera control using natural language alone. In this work, we present a framework for precise camera control with global scene understanding in te…