Gesture-Aware Pretraining and Token Fusion for 3D Hand Pose Estimation
arXiv:2603.17396v3 Announce Type: replace
Abstract: Estimating 3D hand pose from monocular RGB images is fundamental for applications in AR/VR, human-computer interaction, and sign language understanding. In this work we focus on a scenario where a di…