- Provide.ai - Page 92

Mitigating Misalignment Contagion by Steering with Implicit Traits

/ May 12, 2026

arXiv:2605.02751v2 Announce Type: replace
Abstract: Language models (LMs) are increasingly used in high-stakes, multi-agent settings, where following instructions and maintaining value alignment are critical. Most alignment research focuses on interac…

cs.CV

Thinking with Novel Views: A Systematic Analysis of Generative-Augmented Spatial Intelligence

/ May 12, 2026

arXiv:2605.10588v1 Announce Type: new
Abstract: Current Large Multimodal Models (LMMs) struggle with spatial reasoning tasks requiring viewpoint-dependent understanding, largely because they are confined to a single, static observation. We propose Thi…

cs.AI, cs.LG

Physics-Enhanced Deep Learning for Proactive Thermal Runaway Forecasting in Li-Ion Batteries

/ May 12, 2026

arXiv:2604.20175v2 Announce Type: replace-cross
Abstract: Accurate prediction of thermal runaway in lithium-ion batteries is essential for ensuring the safety, efficiency, and reliability of modern energy storage systems. Conventional data-driven appr…

cs.AI

Token Economics for LLM Agents: A Dual-View Study from Computing and Economics

/ May 12, 2026

arXiv:2605.09104v1 Announce Type: new
Abstract: As LLM agents evolve, tokens have emerged as the core economic primitives of Agentic AI. However, their exponential consumption introduces severe computational, collaborative, and security bottlenecks. C…

cs.CV

From Articulated Kinematics to Routed Visual Control for Action-Conditioned Surgical Video Generation

/ May 12, 2026

arXiv:2605.08712v1 Announce Type: new
Abstract: Action-conditioned surgical video generation is a critical yet highly challenging problem for robotic surgery. The core difficulty is that low-dimensional control vectors must precisely govern complex im…

cs.CL, cs.CV

When Relations Break: Analyzing Relation Hallucination in Vision-Language Model Under Rotation and Noise

/ May 12, 2026

arXiv:2605.05045v2 Announce Type: replace-cross
Abstract: Vision-language models (VLMs) achieve strong multimodal performance but remain prone to relation hallucination, which requires accurate reasoning over inter-object interactions. We study the im…

cs.AI, cs.CV, cs.LG

Nemotron 3 Nano Omni: Efficient and Open Multimodal Intelligence

/ May 12, 2026

arXiv:2604.24954v2 Announce Type: replace-cross
Abstract: We introduce Nemotron 3 Nano Omni, the latest model in the Nemotron multimodal series and the first to natively support audio inputs alongside text, images, and video. Nemotron 3 Nano Omni deli…

cs.CV

Qwen-Image-2.0 Technical Report

/ May 12, 2026

arXiv:2605.10730v1 Announce Type: new
Abstract: We present Qwen-Image-2.0, an omni-capable image generation foundation model that unifies high-fidelity generation and precise image editing within a single framework. Despite recent progress, existing m…

cs.AI

Towards Autonomous Railway Operations: A Semi-Hierarchical Deep Reinforcement Learning Approach to the Vehicle Rescheduling Problem

/ May 12, 2026

arXiv:2605.10257v1 Announce Type: new
Abstract: Managing disruptions in railway traffic management is a major challenge. Rising traffic density and infrastructure limits increase complexity, making the Vehicle Routing and Scheduling Problem (VRSP) dif…

cs.CV, cs.GR, cs.MM, cs.SD

Unison: Harmonizing Motion, Speech, and Sound for Human-Centric Audio-Video Generation

/ May 12, 2026

arXiv:2605.08729v1 Announce Type: new
Abstract: Motion, speech, and sound effects are fundamental elements of human-centric videos, yet their heterogeneous temporal characteristics make joint generation highly challenging. Existing audio-video generat…