- Provide.ai - Page 101

Route Before Retrieve: Activating Latent Routing Abilities of LLMs for RAG vs. Long-Context Selection

/ May 12, 2026

arXiv:2605.10235v1 Announce Type: new
Abstract: Recent advances in large language models (LLMs) have expanded the context window to beyond 128K tokens, enabling long-document understanding and multi-source reasoning. A key challenge, however, lies in …

cs.CV

Qwen-Image-2.0 Technical Report

/ May 12, 2026

arXiv:2605.10730v1 Announce Type: new
Abstract: We present Qwen-Image-2.0, an omni-capable image generation foundation model that unifies high-fidelity generation and precise image editing within a single framework. Despite recent progress, existing m…

cs.CV

GenMed: A Pairwise Generative Reformulation of Medical Diagnostic Tasks

/ May 12, 2026

arXiv:2605.10645v1 Announce Type: new
Abstract: Data-driven medical AI is traditionally formulated as a discriminative mapping from input $X$ to output $Y$ via a learned function $f$, which does not generalize well across heterogeneous data and modali…

cs.CV

AUHead: Realistic Emotional Talking Head Generation via Action Units Control

/ May 12, 2026

arXiv:2602.09534v2 Announce Type: replace
Abstract: Realistic talking-head video generation is critical for virtual avatars, film production, and interactive systems. Current methods struggle with nuanced emotional expressions due to the lack of fine-…

cs.AI, cs.LG, cs.SD, eess.SP

PHALAR: Phasors for Learned Musical Audio Representations

/ May 12, 2026

arXiv:2605.03929v3 Announce Type: replace-cross
Abstract: Stem retrieval, the task of matching missing stems to a given audio submix, is a key challenge currently limited by models that discard temporal information. We introduce PHALAR, a contrastive …

cs.CV, cs.GR, cs.MM, cs.SD

Unison: Harmonizing Motion, Speech, and Sound for Human-Centric Audio-Video Generation

/ May 12, 2026

arXiv:2605.08729v1 Announce Type: new
Abstract: Motion, speech, and sound effects are fundamental elements of human-centric videos, yet their heterogeneous temporal characteristics make joint generation highly challenging. Existing audio-video generat…

cs.CV

TrajTok: Learning Trajectory Tokens enables better Video Understanding

/ May 12, 2026

arXiv:2602.22779v2 Announce Type: replace
Abstract: Tokenization in video models, typically through patchification, generates an excessive and redundant number of tokens. This severely limits video efficiency and scalability. While recent trajectory-b…

cs.AI, cs.GT, cs.LG

The Attacker in the Mirror: Breaking Self-Consistency in Safety via Anchored Bipolicy Self-Play

/ May 12, 2026

arXiv:2605.08427v1 Announce Type: new
Abstract: Self-play red team is an established approach to improving AI safety in which different instances of the same model play attacker and defender roles in a zero-sum game, i.e., where the attacker tries to …

cs.AI, cs.LG, q-fin.CP

OrderFusion: Encoding Orderbook for End-to-End Probabilistic Intraday Electricity Price Forecasting

/ May 12, 2026

arXiv:2502.06830v5 Announce Type: replace-cross
Abstract: Probabilistic intraday electricity price forecasting is becoming increasingly important for short-term power-system operation. With increasing renewable generation, demand-side flexibility, and…

cs.AI, cs.LG

DUALFloodGNN: Physics-informed Graph Neural Network for Operational Flood Modeling

/ May 12, 2026

arXiv:2512.23964v2 Announce Type: replace-cross
Abstract: Flood models inform strategic disaster management by simulating the spatiotemporal hydrodynamics of flooding. While physics-based numerical flood models are accurate, their substantial computat…