cs.AI, cs.CL, cs.CV

Mitigating Coordinate Prediction Bias from Positional Encoding Failures

arXiv:2510.22102v2 Announce Type: replace-cross
Abstract: While Multimodal Large Language Models (MLLMs) excel at general vision-language tasks, precise coordinate prediction remains a significant challenge, particularly as high-resolution inputs caus…