- Provide.ai - Page 433

LongCoT: Benchmarking Long-Horizon Chain-of-Thought Reasoning

/ April 16, 2026

arXiv:2604.14140v1 Announce Type: cross
Abstract: As language models are increasingly deployed for complex autonomous tasks, their ability to reason accurately over longer horizons becomes critical. An essential component of this ability is planning a…

cs.CL

Guaranteeing Knowledge Integration with Joint Decoding for Retrieval-Augmented Generation

/ April 16, 2026

arXiv:2604.08046v2 Announce Type: replace
Abstract: Retrieval-Augmented Generation (RAG) significantly enhances Large Language Models (LLMs) by providing access to external knowledge. However, current research primarily focuses on retrieval quality, o…

cs.LG

Data-Efficient RLVR via Off-Policy Influence Guidance

/ April 16, 2026

arXiv:2510.26491v2 Announce Type: replace
Abstract: Data selection is a critical aspect of Reinforcement Learning with Verifiable Rewards (RLVR) for enhancing the reasoning capabilities of large language models (LLMs). Current data selection methods a…

cs.AI, cs.CL

From Prediction to Justification: Aligning Sentiment Reasoning with Human Rationale via Reinforcement Learning

/ April 16, 2026

arXiv:2604.13398v1 Announce Type: cross
Abstract: While Aspect-based Sentiment Analysis (ABSA) systems have achieved high accuracy in identifying sentiment polarities, they often operate as “black boxes,” lacking the explicit reasoning capabilities ch…

cs.CL, cs.CV

When ‘YES’ Meets ‘BUT’: Can Large Models Comprehend Contradictory Humor Through Comparative Reasoning?

/ April 16, 2026

arXiv:2503.23137v2 Announce Type: replace-cross
Abstract: Understanding humor-particularly when it involves complex, contradictory narratives that require comparative reasoning-remains a significant challenge for large vision-language models (VLMs). T…

cs.CV

Right Regions, Wrong Labels: Semantic Label Flips in Segmentation under Correlation Shift

/ April 16, 2026

arXiv:2604.13326v1 Announce Type: new
Abstract: The robustness of machine learning models can be compromised by spurious correlations between non-causal features in the input data and target labels. A common way to test for such correlations is to tra…

cs.LG

Guided Transfer Learning for Discrete Diffusion Models

/ April 16, 2026

arXiv:2512.10877v4 Announce Type: replace
Abstract: Discrete diffusion models (DMs) have achieved strong performance in language and other discrete domains, offering a compelling alternative to autoregressive modeling. Yet this performance typically d…

cs.AI, cs.CL, cs.CV

MERRIN: A Benchmark for Multimodal Evidence Retrieval and Reasoning in Noisy Web Environments

/ April 16, 2026

arXiv:2604.13418v1 Announce Type: cross
Abstract: Motivated by the underspecified, multi-hop nature of search queries and the multimodal, heterogeneous, and often conflicting nature of real-world web results, we introduce MERRIN (Multimodal Evidence R…

cs.AI, cs.CV

FieldWorkArena: Agentic AI Benchmark for Real Field Work Tasks

/ April 16, 2026

arXiv:2505.19662v3 Announce Type: replace
Abstract: This paper introduces FieldWorkArena, a benchmark for agentic AI targeting real-world field work. With the recent increase in demand for agentic AI, they are built to detect and document safety hazar…

cs.CL

CANVAS: Continuity-Aware Narratives via Visual Agentic Storyboarding

/ April 16, 2026

arXiv:2604.13452v1 Announce Type: new
Abstract: Long-form visual storytelling requires maintaining continuity across shots, including consistent characters, stable environments, and smooth scene transitions. While existing generative models can produc…