ORIC: Benchmarking Object Recognition under Contextual Incongruity in Large Vision-Language Models
arXiv:2509.15695v5 Announce Type: replace
Abstract: Large Vision-Language Models (LVLMs) excel at captioning, visual question answering, and robotics by combining vision and language, yet they often miss obvious objects or hallucinate nonexistent ones…