cs.AI, cs.CV

Perceptual Flow Network for Visually Grounded Reasoning

arXiv:2605.02730v1 Announce Type: cross
Abstract: Despite the success of Large-Vision Language Models (LVLMs), general optimization objectives (e.g., standard MLE) fail to constrain visual trajectories, leading to language bias and hallucination. To m…

cs.IR, cs.LG

Multimodal Data Curation Through Ranked Retrieval

arXiv:2605.01163v1 Announce Type: cross
Abstract: Shared embedding spaces are widely used for multimodal search and data curation. In practice, two problems often limit how well this works. First, embeddings can reflect modality more than meaning, so …

cs.DC, cs.LG

STAR: Decode-Phase Rescheduling for LLM Inference

arXiv:2510.13668v2 Announce Type: replace-cross
Abstract: Large Language Model (LLM) inference has emerged as a fundamental paradigm, however, variations in output length cause severe workload imbalance in the decode phase, particularly for long-outpu…

Scroll to Top