Seer: Language Instructed Video Prediction with Latent Diffusion Models
arXiv:2303.14897v4 Announce Type: replace
Abstract: Imagining the future trajectory is the key for robots to make sound planning and successfully reach their goals. Therefore, text-conditioned video prediction (TVP) is an essential task to facilitate …