cs.AI, cs.CV, cs.LG, stat.ML

Vision Hopfield Memory Networks

arXiv:2603.25157v1 Announce Type: cross
Abstract: Recent vision and multimodal foundation backbones, such as Transformer families and state-space models like Mamba, have achieved remarkable progress, enabling unified modeling across images, text, and …

cs.CL

Can GRPO Boost Complex Multimodal Table Understanding?

arXiv:2509.16889v3 Announce Type: replace
Abstract: Existing table understanding methods face challenges due to complex table structures and intricate logical reasoning. While supervised finetuning (SFT) dominates existing research, reinforcement lear…

Scroll to Top