- Provide.ai - Page 82

VoxAfford: Multi-Scale Voxel-Token Fusion for Open-Vocabulary 3D Affordance Detection

/ May 5, 2026

arXiv:2605.01365v1 Announce Type: cross
Abstract: Open-vocabulary 3D affordance detection requires localizing interaction regions on point clouds given novel affordance descriptions. Recent methods extend multimodal large language models (MLLMs) with …

cs.AI, cs.CV

SRGAN-CKAN: Expressive Super-Resolution with Nonlinear Functional Operators under Minimal Resources

/ May 5, 2026

arXiv:2605.01459v1 Announce Type: cross
Abstract: Single-Image Super-Resolution (SISR) aims to reconstruct a High-Resolution (HR) image from a Low-Resolution (LR) observation, a fundamentally ill-posed problem where high-frequency details are severely…

cs.LG, cs.SY, eess.SY, quant-ph

From Characterization To Construction: Generative Quantum Circuit Synthesis from Gate Set Tomography Data

/ May 5, 2026

arXiv:2605.01367v1 Announce Type: cross
Abstract: High-fidelity circuit execution on noisy intermediate-scale quantum devices is bottlenecked by compilation pipelines that disregard complex, correlated noise. To address this, this methodology article …

cs.AI, cs.CL, cs.CV

Beyond SFT-to-RL: Pre-alignment via Black-Box On-Policy Distillation for Multimodal RL

/ May 5, 2026

arXiv:2604.28123v2 Announce Type: replace-cross
Abstract: The standard post-training recipe for large multimodal models (LMMs) applies supervised fine-tuning (SFT) on curated demonstrations followed by reinforcement learning with verifiable rewards (R…

cs.AI, cs.LG

Federated Semi-Supervised Graph Neural Networks with Prototype-Guided Pseudo-Labeling for Privacy-Preserving Gestational Diabetes Mellitus Prediction

/ May 5, 2026

arXiv:2605.01810v1 Announce Type: cross
Abstract: Gestational Diabetes Mellitus (GDM) is a high-prevalence pregnancy complication that requires accurate early risk stratification to reduce maternal and fetal morbidity. However, real-world clinical dep…

cs.CV

SpecEdit: Training-Free Acceleration for Diffusion based Image Editing via Semantic Locking

/ May 5, 2026

arXiv:2605.02152v1 Announce Type: new
Abstract: Diffusion-based image editing offers strong semantic controllability, but remains computationally expensive due to iterative high-resolution denoising over all spatial tokens. Dynamic-resolution sampling…

cs.CV, cs.HC, cs.LG

Multimodal Ambivalence/Hesitancy Recognition in Videos for Personalized Digital Health Interventions

/ May 5, 2026

arXiv:2604.11730v3 Announce Type: replace-cross
Abstract: Using behavioural science, health interventions focus on behaviour change by providing a framework to help patients acquire and maintain healthy habits that improve medical outcomes. In-person …

cs.AI, cs.CV, cs.LG

AVA-Bench: Atomic Visual Ability Benchmark for Vision Foundation Models

/ May 5, 2026

arXiv:2506.09082v5 Announce Type: replace-cross
Abstract: The rise of vision foundation models (VFMs) calls for systematic evaluation. A common approach pairs VFMs with large language models (LLMs) as general-purpose heads, followed by evaluation on b…

cs.CV

Adapting Vision-Language Foundation Model for Next Generation Medical Ultrasound Image Analysis

/ May 5, 2026

arXiv:2506.08849v4 Announce Type: replace
Abstract: Vision-Language Foundation Models (VLFMs) exhibit remarkable generalization, yet their direct application to medical ultrasound is severely hindered by a profound modality gap. The unique acoustic ph…

cs.LG

Skipping the Zeros in Diffusion Models for Sparse Data Generation

/ May 5, 2026

arXiv:2605.01817v1 Announce Type: new
Abstract: Diffusion models (DMs) excel on dense continuous data, but are not designed for sparse continuous data. They do not model exact zeros that represent the deliberate absence of a signal. As a result, they …