Understanding Cognitive States from Head & Hand Motion Data

arXiv:2509.24255v2 Announce Type: replace-cross Abstract: As virtual reality (VR) becomes widespread, head and hand motion data captured by consumer systems has become substantially more common. However, the extent of what can be inferred from such motion remains unclear. This paper investigates whether \textit{transient cognitive states}, specifically confusion, hesitation, and readiness during different stages of decision-making, can be inferred from VR telemetry alone. We introduce a novel dataset of head and hand motion collected during structured decision-making tasks, with frame-level annotations of these states. We evaluate classical machine learning models, temporal neural networks, and motion foundation models under two protocols: (1) future-in-time prediction for the same users, and (2) cross-user generalization to unseen users. We further propose a VR-native motion adapter that maps sparse VR telemetry to representations compatible with motion foundation models pretrained on large-scale full-body motion data, enabling transfer without explicit full-body reconstruction. To our knowledge, this is the first work to adapt a motion foundation model to VR motion for a classification task. Results show that motion-only sensing captures meaningful signals of cognitive states, and that pretrained motion foundation models generalize more effectively than classical and temporal models even with a small dataset of 24 participants. Our approach achieves 82% accuracy, comparable to and sometimes surpassing human observers. These findings suggest that VR motion encodes richer behavioral information than previously assumed and highlight the potential of large-scale motion pretraining for XR applications. We will release the dataset and modeling framework to support future research.

Leave a Comment