Face-to-Face: A Video Dataset for Multi-Person Interaction Modeling
arXiv:2603.14794v2 Announce Type: replace
Abstract: Modeling the reactive tempo of human conversation remains difficult because most audio-visual datasets portray isolated speakers delivering short monologues. We introduce \textbf{Face-to-Face with Ji…