- Provide.ai - Page 491

Data-Efficient RLVR via Off-Policy Influence Guidance

/ April 16, 2026

arXiv:2510.26491v2 Announce Type: replace
Abstract: Data selection is a critical aspect of Reinforcement Learning with Verifiable Rewards (RLVR) for enhancing the reasoning capabilities of large language models (LLMs). Current data selection methods a…

cs.AI, cs.LG

RiskWebWorld: A Realistic Interactive Benchmark for GUI Agents in E-commerce Risk Management

/ April 16, 2026

arXiv:2604.13531v1 Announce Type: new
Abstract: Graphical User Interface (GUI) agents show strong capabilities for automating web tasks, but existing interactive benchmarks primarily target benign, predictable consumer environments. Their effectivenes…

cs.AI, cs.CL

From Prediction to Justification: Aligning Sentiment Reasoning with Human Rationale via Reinforcement Learning

/ April 16, 2026

arXiv:2604.13398v1 Announce Type: cross
Abstract: While Aspect-based Sentiment Analysis (ABSA) systems have achieved high accuracy in identifying sentiment polarities, they often operate as “black boxes,” lacking the explicit reasoning capabilities ch…

cs.CL, cs.CV

When ‘YES’ Meets ‘BUT’: Can Large Models Comprehend Contradictory Humor Through Comparative Reasoning?

/ April 16, 2026

arXiv:2503.23137v2 Announce Type: replace-cross
Abstract: Understanding humor-particularly when it involves complex, contradictory narratives that require comparative reasoning-remains a significant challenge for large vision-language models (VLMs). T…

cs.CV

Right Regions, Wrong Labels: Semantic Label Flips in Segmentation under Correlation Shift

/ April 16, 2026

arXiv:2604.13326v1 Announce Type: new
Abstract: The robustness of machine learning models can be compromised by spurious correlations between non-causal features in the input data and target labels. A common way to test for such correlations is to tra…

cs.LG

Guided Transfer Learning for Discrete Diffusion Models

/ April 16, 2026

arXiv:2512.10877v4 Announce Type: replace
Abstract: Discrete diffusion models (DMs) have achieved strong performance in language and other discrete domains, offering a compelling alternative to autoregressive modeling. Yet this performance typically d…

cs.AI, cs.CL, cs.CV

MERRIN: A Benchmark for Multimodal Evidence Retrieval and Reasoning in Noisy Web Environments

/ April 16, 2026

arXiv:2604.13418v1 Announce Type: cross
Abstract: Motivated by the underspecified, multi-hop nature of search queries and the multimodal, heterogeneous, and often conflicting nature of real-world web results, we introduce MERRIN (Multimodal Evidence R…

cs.AI, cs.CV

FieldWorkArena: Agentic AI Benchmark for Real Field Work Tasks

/ April 16, 2026

arXiv:2505.19662v3 Announce Type: replace
Abstract: This paper introduces FieldWorkArena, a benchmark for agentic AI targeting real-world field work. With the recent increase in demand for agentic AI, they are built to detect and document safety hazar…

cs.CL

CANVAS: Continuity-Aware Narratives via Visual Agentic Storyboarding

/ April 16, 2026

arXiv:2604.13452v1 Announce Type: new
Abstract: Long-form visual storytelling requires maintaining continuity across shots, including consistent characters, stable environments, and smooth scene transitions. While existing generative models can produc…

cs.CV

Geometric Context Transformer for Streaming 3D Reconstruction

/ April 16, 2026

arXiv:2604.14141v1 Announce Type: new
Abstract: Streaming 3D reconstruction aims to recover 3D information, such as camera poses and point clouds, from a video stream, which necessitates geometric accuracy, temporal
consistency, and computational ef…