- Provide.ai - Page 304

True to Tone? Quantifying Skin Tone Fidelity and Bias in Photographic-to-Virtual Human Pipelines

/ April 3, 2026

arXiv:2604.02055v1 Announce Type: new
Abstract: Accurate reproduction of facial skin tone is essential for realism, identity preservation, and fairness in Virtual Human (VH) rendering. However, most accessible avatar creation pipelines rely on photogr…

cs.CV

Scaling Video Pretraining for Surgical Foundation Models

/ April 3, 2026

arXiv:2603.29966v2 Announce Type: replace
Abstract: Surgical video understanding is essential for computer-assisted interventions, yet existing surgical foundation models remain constrained by limited data scale, procedural diversity, and inconsistent…

cs.AI, cs.CV, cs.IR, cs.LG

MOON3.0: Reasoning-aware Multimodal Representation Learning for E-commerce Product Understanding

/ April 3, 2026

arXiv:2604.00513v2 Announce Type: replace-cross
Abstract: With the rapid growth of e-commerce, exploring general representations rather than task-specific ones has attracted increasing attention. Although recent multimodal large language models (MLLMs…

cs.AI, cs.CL, cs.CV

KARL: Knowledge-Aware Reasoning and Reinforcement Learning for Knowledge-Intensive Visual Grounding

/ April 3, 2026

arXiv:2503.12797v3 Announce Type: replace
Abstract: Knowledge-Intensive Visual Grounding (KVG) requires models to localize objects using fine-grained, domain-specific entity names rather than generic referring expressions. Although Multimodal Large La…

cs.AI, cs.LG, cs.SD

Woosh: A Sound Effects Foundation Model

/ April 3, 2026

arXiv:2604.01929v1 Announce Type: cross
Abstract: The audio research community depends on open generative models as foundational tools for building novel approaches and establishing baselines. In this report, we present Woosh, Sony AI’s publicly relea…

cs.CV

PLUME: Latent Reasoning Based Universal Multimodal Embedding

/ April 3, 2026

arXiv:2604.02073v1 Announce Type: new
Abstract: Universal multimodal embedding (UME) maps heterogeneous inputs into a shared retrieval space with a single model. Recent approaches improve UME by generating explicit chain-of-thought (CoT) rationales be…

cs.AI, cs.CV, cs.SE

GPA: Learning GUI Process Automation from Demonstrations

/ April 3, 2026

arXiv:2604.01676v1 Announce Type: new
Abstract: GUI Process Automation (GPA) is a lightweight but general vision-based Robotic Process Automation (RPA), which enables fast and stable process replay with only a single demo. Addressing the fragility of …

cs.CV, eess.IV

FluoCLIP: Stain-Aware Focus Quality Assessment in Fluorescence Microscopy

/ April 3, 2026

arXiv:2602.23791v2 Announce Type: replace-cross
Abstract: Accurate focus quality assessment (FQA) in fluorescence microscopy is challenging due to stain-dependent optical variations that induce heterogeneous focus behavior across images. Existing meth…

cs.CV, cs.LG

Robust Adaptation of Foundation Models with Black-Box Visual Prompting

/ April 3, 2026

arXiv:2407.17491v3 Announce Type: replace
Abstract: With a surge of large-scale pre-trained models, parameter-efficient transfer learning (PETL) of large models has garnered significant attention. While promising, they commonly rely on two optimistic …

cs.AI, cs.CL, cs.CR, cs.LG, cs.SE

RuleForge: Automated Generation and Validation for Web Vulnerability Detection at Scale

/ April 3, 2026

arXiv:2604.01977v1 Announce Type: cross
Abstract: Security teams face a challenge: the volume of newly disclosed Common Vulnerabilities and Exposures (CVEs) far exceeds the capacity to manually develop detection mechanisms. In 2025, the National Vulne…