- Provide.ai - Page 59

E-TCAV: Formalizing Penultimate Proxies for Efficient Concept Based Interpretability

/ May 12, 2026

arXiv:2605.10261v1 Announce Type: new
Abstract: TCAV (Testing with Concept Activation Vectors) is an interpretability method that assesses the alignment between the internal representations of a trained neural network and human-understandable, high-le…

cs.CV

AUHead: Realistic Emotional Talking Head Generation via Action Units Control

/ May 12, 2026

arXiv:2602.09534v2 Announce Type: replace
Abstract: Realistic talking-head video generation is critical for virtual avatars, film production, and interactive systems. Current methods struggle with nuanced emotional expressions due to the lack of fine-…

cs.CE, cs.CL, cs.CV, cs.LG

SpatiaLab: Can Vision-Language Models Perform Spatial Reasoning in the Wild?

/ May 12, 2026

arXiv:2602.03916v3 Announce Type: replace-cross
Abstract: Spatial reasoning is a fundamental aspect of human cognition, yet it remains a major challenge for contemporary vision-language models (VLMs). Prior work largely relied on synthetic or LLM-gene…

cs.AI, cs.LG, cs.SD, eess.SP

PHALAR: Phasors for Learned Musical Audio Representations

/ May 12, 2026

arXiv:2605.03929v3 Announce Type: replace-cross
Abstract: Stem retrieval, the task of matching missing stems to a given audio submix, is a key challenge currently limited by models that discard temporal information. We introduce PHALAR, a contrastive …

cs.CV, cs.GR, cs.MM, cs.SD

Unison: Harmonizing Motion, Speech, and Sound for Human-Centric Audio-Video Generation

/ May 12, 2026

arXiv:2605.08729v1 Announce Type: new
Abstract: Motion, speech, and sound effects are fundamental elements of human-centric videos, yet their heterogeneous temporal characteristics make joint generation highly challenging. Existing audio-video generat…

cs.CV

TrajTok: Learning Trajectory Tokens enables better Video Understanding

/ May 12, 2026

arXiv:2602.22779v2 Announce Type: replace
Abstract: Tokenization in video models, typically through patchification, generates an excessive and redundant number of tokens. This severely limits video efficiency and scalability. While recent trajectory-b…

cs.AI

IndustryBench: Probing the Industrial Knowledge Boundaries of LLMs

/ May 12, 2026

arXiv:2605.10267v1 Announce Type: new
Abstract: In industrial procurement, an LLM answer is useful only if it survives a standards check: recommended material must match operating condition, every parameter must respect a regulated threshold, and no p…

cs.AI, cs.CV, cs.LG

CERSA: Cumulative Energy-Retaining Subspace Adaptation for Memory-Efficient Fine-Tuning

/ May 12, 2026

arXiv:2605.08174v1 Announce Type: cross
Abstract: To mitigate the memory constraints associated with fine-tuning large pre-trained models, existing parameter-efficient fine-tuning (PEFT) methods, such as LoRA, rely on low-rank updates. However, such u…

cs.LG

What Time Is It? How Data Geometry Makes Time Conditioning Optional for Flow Matching

/ May 12, 2026

arXiv:2605.08344v1 Announce Type: new
Abstract: Recent work has shown that models flow matching models can be trained without explicit time conditioning, challenging the standard view that the interpolation time is needed to disambiguate velocity targ…

cs.CL, cs.CV, cs.RO

Xiaomi OneVL: One-Step Latent Reasoning and Planning with Vision-Language Explanation

/ May 12, 2026

arXiv:2604.18486v3 Announce Type: replace-cross
Abstract: Chain-of-Thought (CoT) reasoning has become a powerful driver of trajectory prediction in VLA-based autonomous driving, yet its autoregressive nature imposes a latency cost that is prohibitive …