cs.CV

InHabit: Leveraging Image Foundation Models for Scalable 3D Human Placement

arXiv:2604.19673v1 Announce Type: new
Abstract: Training embodied agents to understand 3D scenes as humans do requires large-scale data of people meaningfully interacting with diverse environments, yet such data is scarce. Real-world motion capture is…