Can LLM-Generated Text Empower Surgical Vision-Language Pre-training?
arXiv:2604.18134v1 Announce Type: new
Abstract: Recent advancements in self-supervised learning have led to powerful surgical vision encoders capable of spatiotemporal understanding. However, extending these visual foundations to multi-modal reasoning…