- Provide.ai - Page 447

Capture the Flags: Family-Based Evaluation of Agentic LLMs via Semantics-Preserving Transformations

/ April 20, 2026

arXiv:2602.05523v2 Announce Type: replace-cross
Abstract: Agentic large language models (LLMs) are increasingly evaluated on cybersecurity tasks using capture-the-flag (CTF) benchmarks, yet existing pointwise benchmarks offer limited insight into agen…

cs.AI, cs.CR

Security Threat Modeling for Emerging AI-Agent Protocols: A Comparative Analysis of MCP, A2A, Agora, and ANP

/ April 20, 2026

arXiv:2602.11327v2 Announce Type: replace-cross
Abstract: The rapid development of the AI agent communication protocols, including the Model Context Protocol (MCP), Agent2Agent (A2A), Agora, and Agent Network Protocol (ANP), is reshaping how AI agents…

cs.CV

Polyglot: Multilingual Style Preserving Speech-Driven Facial Animation

/ April 20, 2026

arXiv:2604.16108v1 Announce Type: new
Abstract: Speech-Driven Facial Animation (SDFA) has gained significant attention due to its applications in movies, video games, and virtual reality. However, most existing models are trained on single-language da…

cs.AI, cs.CR

The Blind Spot of Agent Safety: How Benign User Instructions Expose Critical Vulnerabilities in Computer-Use Agents

/ April 20, 2026

arXiv:2604.10577v2 Announce Type: replace-cross
Abstract: Computer-use agents (CUAs) can now autonomously complete complex tasks in real digital environments, but when misled, they can also be used to automate harmful actions programmatically. Existin…

cs.CV

From Articles to Canopies: Knowledge-Driven Pseudo-Labelling for Tree Species Classification using LLM Experts

/ April 20, 2026

arXiv:2604.16115v1 Announce Type: new
Abstract: Hyperspectral tree species classification is challenging due to limited and imbalanced class labels, spectral mixing (overlapping light signatures from multiple species), and ecological heterogeneity (va…

cs.AI, cs.CV

VIB-Probe: Detecting and Mitigating Hallucinations in Vision-Language Models via Variational Information Bottleneck

/ April 20, 2026

arXiv:2601.05547v2 Announce Type: replace-cross
Abstract: Vision-Language Models (VLMs) have demonstrated remarkable progress in multimodal tasks, but remain susceptible to hallucinations, where generated text deviates from the underlying visual conte…

cs.AI

DeepER-Med: Advancing Deep Evidence-Based Research in Medicine Through Agentic AI

/ April 20, 2026

arXiv:2604.15456v1 Announce Type: new
Abstract: Trustworthiness and transparency are essential for the clinical adoption of artificial intelligence (AI) in healthcare and biomedical research. Recent deep research systems aim to accelerate evidence-gro…

cs.CV

Winner of CVPR2026 NTIRE Challenge on Image Shadow Removal: Semantic and Geometric Guidance for Shadow Removal via Cascaded Refinement

/ April 20, 2026

arXiv:2604.16177v1 Announce Type: new
Abstract: We present a three-stage progressive shadow-removal pipeline for the CVPR2026 NTIRE WSRD+ challenge. Built on OmniSR, our method treats deshadowing as iterative direct refinement, where later stages corr…

cs.CV

Cross-modal learning for plankton recognition

/ April 20, 2026

arXiv:2603.16427v2 Announce Type: replace
Abstract: This paper considers self-supervised cross-modal coordination as a strategy enabling utilization of multiple modalities and large volumes of unlabeled plankton data to build models for plankton recog…

cs.CV

OmniShow: Unifying Multimodal Conditions for Human-Object Interaction Video Generation

/ April 20, 2026

arXiv:2604.11804v2 Announce Type: replace
Abstract: In this work, we study Human-Object Interaction Video Generation (HOIVG), which aims to synthesize high-quality human-object interaction videos conditioned on text, reference images, audio, and pose….