A Multimodal Pre-trained Network for Integrated EEG-Video Seizure Detection
arXiv:2604.26379v1 Announce Type: new
Abstract: Reliable seizure detection in mouse models is essential for preclinical epilepsy research, yet manual review of synchronized video-EEG recordings is labor-intensive and single-modality systems fail for complementary reasons: video-based methods are easily confounded by benign behaviors, whereas EEG-based methods are vulnerable to ictal motion artifacts. We present EEGVFusion, a multimodal framework that combines self-supervised EEG representation learning, spatio-temporal video encoding, optimal-transport alignment, and bidirectional cross-attention to integrate neural and behavioral evidence. We also curate an expert-annotated dataset of synchronized EEG and video recordings comprising 93 sessions from 15 mice for training and evaluation. In the random-session split, EEGVFusion achieved a Balanced Accuracy of 0.9957 with perfect event sensitivity and an Event FAR of 0.6250 FP/h, indicating strong seizure detection performance with a low false-alarm burden. In a single held-out-subject evaluation with Subject 110 reserved for testing, EEGVFusion achieved a Balanced Accuracy of 0.9718 and reduced Event FAR from 2.7250 FP/h for the EEG-only counterpart to 0.4833 FP/h while preserving perfect event sensitivity. Targeted ablations further showed that EEG pre-training and OT alignment help reduce false alarms while preserving event sensitivity.