cs.CV

THFM: A Unified Video Foundation Model for 4D Human Perception and Beyond

arXiv:2603.25892v1 Announce Type: new
Abstract: We present THFM, a unified video foundation model for human-centric perception that jointly addresses dense tasks (depth, normals, segmentation, dense pose) and sparse tasks (2d/3d keypoint estimation) w…