- Provide.ai - Page 40

VEBench:Benchmarking Large Multimodal Models for Real-World Video Editing

/ May 6, 2026

arXiv:2605.03276v1 Announce Type: new
Abstract: Real-world video editing demands not only expert knowledge of cinematic techniques but also multimodal reasoning to select, align, and combine footage into coherent narratives. While recent Large Multimo…

cs.AI, cs.CR

On the Privacy of LLMs: An Ablation Study

/ May 6, 2026

arXiv:2605.02255v1 Announce Type: cross
Abstract: Large language models (LLMs) are increasingly deployed in interactive and retrieval-augmented settings, raising significant privacy concerns. While attacks such as Membership Inference (MIA), Attribute…

cs.CV, cs.SE

ClawMark: A Living-World Benchmark for Multi-Turn, Multi-Day, Multimodal Coworker Agents

/ May 6, 2026

arXiv:2604.23781v2 Announce Type: replace
Abstract: Language-model agents are increasingly used as persistent coworkers that assist users across multiple working days. During such workflows, the surrounding environment may change independently of the …

cs.AI

Valley3: Scaling Omni Foundation Models for E-commerce

/ May 6, 2026

arXiv:2605.01278v1 Announce Type: new
Abstract: In this work, we present Valley3, an omni multimodal large language model (MLLM) developed for diverse global e-commerce tasks, with unified understanding and reasoning capabilities across text, images, …

cs.AI

EngiBench: A Benchmark for Evaluating Large Language Models on Engineering Problem Solving

/ May 6, 2026

arXiv:2509.17677v2 Announce Type: replace
Abstract: Large language models (LLMs) have shown strong performance on mathematical reasoning under well-defined conditions. However, real-world engineering problems involve uncertainty, context, and open-end…

cs.AI

Lifting Traces to Logic: Programmatic Skill Induction with Neuro-Symbolic Learning for Long-Horizon Agentic Tasks

/ May 6, 2026

arXiv:2605.01293v1 Announce Type: new
Abstract: Foundation model-driven agents often struggle with long-horizon planning due to the transient nature of purely prompting-based reasoning. While existing skill induction methods mitigate this by distillin…

cs.AI

MedGemma 1.5 Technical Report

/ May 6, 2026

arXiv:2604.05081v2 Announce Type: replace
Abstract: We introduce MedGemma 1.5 4B, the latest model in the MedGemma collection. MedGemma 1.5 expands on MedGemma 1 by integrating additional capabilities: high-dimensional medical imaging (CT/MRI volumes …

cs.AI

DiagramNet: An End-to-End Recognition Framework and Dataset for Non-Standard System-Level Diagrams

/ May 6, 2026

arXiv:2605.01338v1 Announce Type: new
Abstract: System-level diagrams encode the architectural blueprint of chip design, specifying module functions, dataflows, and interface protocols. However, non-standardized symbols and the scarcity of structured …

cs.AI, cs.CV, cs.MM

Enhancing Self-Supervised Talking Head Forgery Detection via a Training-Free Dual-System Framework

/ May 6, 2026

arXiv:2605.03390v1 Announce Type: cross
Abstract: Supervised talking head forgery detection faces severe generalization challenges due to the continuous evolution of generators. By reducing reliance on generator-specific forgery patterns, self-supervi…

cs.CV

Chorus: Multi-Teacher Pretraining for Holistic 3D Gaussian Scene Encoding

/ May 6, 2026

arXiv:2512.17817v3 Announce Type: replace
Abstract: While 3DGS has emerged as a high-fidelity scene representation, encoding rich, general-purpose features directly from its primitives remains under-explored. We address this gap by introducing Chorus,…