- Provide.ai - Page 50

Exploring Reasoning Reward Model for Agents

/ April 29, 2026

arXiv:2601.22154v2 Announce Type: replace-cross
Abstract: Agentic Reinforcement Learning (Agentic RL) has achieved notable success in enabling agents to perform complex reasoning and tool use. However, most methods still relies on sparse outcome-based…

cs.AI, cs.CL, cs.GR

Cutscene Agent: An LLM Agent Framework for Automated 3D Cutscene Generation

/ April 29, 2026

arXiv:2604.25318v1 Announce Type: cross
Abstract: Cutscenes are carefully choreographed cinematic sequences embedded in video games and interactive media, serving as the primary vehicle for narrative delivery, character development, and emotional enga…

cs.LG, cs.RO

Egocentric Tactile and Proximity Sensors as Observation Priors for Humanoid Collision Avoidance

/ April 29, 2026

arXiv:2604.25554v1 Announce Type: cross
Abstract: Collision-free motion is often aided by tactile and proximity sensors distributed on the body of the robot due to their resistance to occlusion as opposed to external cameras. However, how to shape the…

cs.LG, cs.NA, math.NA

A Hybridizable Neural Time Integrator for Stable Autoregressive Forecasting

/ April 29, 2026

arXiv:2604.21101v2 Announce Type: replace
Abstract: For autoregressive modeling of chaotic dynamical systems over long time horizons, the stability of both training and inference is a major challenge in building scientific foundation models. We presen…

cs.CV

OmniSch: A Multimodal PCB Schematic Benchmark For Structured Diagram Visual Reasoning

/ April 29, 2026

arXiv:2604.00270v3 Announce Type: replace
Abstract: Recent large multimodal models (LMMs) have made rapid progress in visual grounding, document understanding, and diagram reasoning tasks. However, their ability to convert Printed Circuit Board (PCB) …

cs.DC, cs.LG

Cornserve: A Distributed Serving System for Any-to-Any Multimodal Models

/ April 29, 2026

arXiv:2603.12118v2 Announce Type: replace
Abstract: Any-to-Any models are an emerging class of multimodal models that accept combinations of multimodal data (e.g., text, image, video, audio) as input and generate them as output. Serving these models a…

cs.AI, cs.CV

Latent Anomaly Knowledge Excavation: Unveiling Sparse Sensitive Neurons in Vision-Language Models

/ April 29, 2026

arXiv:2604.07802v3 Announce Type: replace
Abstract: Large-scale vision-language models (VLMs) exhibit remarkable zero-shot capabilities, yet the internal mechanisms driving their anomaly detection (AD) performance remain poorly understood. Current met…

cs.CV

NTIRE 2026 Rip Current Detection and Segmentation (RipDetSeg) Challenge Report

/ April 29, 2026

arXiv:2604.17070v2 Announce Type: replace
Abstract: This report presents the NTIRE 2026 Rip Current Detection and Segmentation (RipDetSeg) Challenge, which targets automatic rip current understanding in images. Rip currents are hazardous nearshore flo…

cs.CL, cs.CV

Toward Multimodal Conversational AI for Age-Related Macular Degeneration

/ April 29, 2026

arXiv:2604.25720v1 Announce Type: cross
Abstract: Despite strong performance of deep learning models in retinal disease detection, most systems produce static predictions without clinical reasoning or interactive explanation. Recent advances in multim…

cs.CV, cs.SD

Mutual Forcing: Dual-Mode Self-Evolution for Fast Autoregressive Audio-Video Character Generation

/ April 29, 2026

arXiv:2604.25819v1 Announce Type: new
Abstract: In this work, we propose Mutual Forcing, a framework for fast autoregressive audio-video generation with long-horizon audio-video synchronization. Our approach addresses two key challenges: joint audio-v…