- Provide.ai - Page 31

Gaussian Mixture Model with unknown diagonal covariances via continuous sparse regularization

/ May 14, 2026

arXiv:2509.12889v4 Announce Type: replace-cross
Abstract: This paper addresses the statistical estimation of Gaussian Mixture Models (GMMs) with unknown diagonal covariances from independent and identically distributed samples. We employ the Beurling-…

cs.LG, stat.ML

Sample-Efficient Optimisation over the Outputs of Generative Models

/ May 14, 2026

arXiv:2509.23800v3 Announce Type: replace
Abstract: Modern generative AI models, such as diffusion and flow matching models, can sample from rich data distributions. However, many applications, especially in science and engineering, require more than …

cs.AI, cs.CL

Achieving Gold-Medal-Level Olympiad Reasoning via Simple and Unified Scaling

/ May 14, 2026

arXiv:2605.13301v1 Announce Type: new
Abstract: Recent progress in reasoning models has substantially advanced long-horizon mathematical and scientific problem solving, with several systems now reaching gold-medal-level performance on International Ma…

cs.AI, cs.CL, cs.LG

Interactive Benchmarks

/ May 14, 2026

arXiv:2603.04737v3 Announce Type: replace
Abstract: Existing reasoning evaluation paradigms suffer from different limitations: fixed benchmarks are increasingly saturated and vulnerable to contamination, while preference-based evaluations rely on subj…

cs.AI

IndustryBench: Probing the Industrial Knowledge Boundaries of LLMs

/ May 14, 2026

arXiv:2605.10267v3 Announce Type: replace
Abstract: In industrial procurement, an LLM answer is useful only if it survives a standards check: recommended material must match operating condition, every parameter must respect a regulated threshold, and …

cs.AI

RS-Claw: Progressive Active Tool Exploration via Hierarchical Skill Trees for Remote Sensing Agents

/ May 14, 2026

arXiv:2605.13391v1 Announce Type: new
Abstract: The rise of multi-modal large language models (MLLMs) is shifting remote sensing (RS) intelligence from “see” to “action”, as OpenClaw-style frameworks enable agents to autonomously operate massive RS im…

cs.AI

MMSkills: Towards Multimodal Skills for General Visual Agents

/ May 14, 2026

arXiv:2605.13527v1 Announce Type: new
Abstract: Reusable skills have become a core substrate for improving agent capabilities, yet most existing skill packages encode reusable behavior primarily as textual prompts, executable code, or learned routines…

cs.AI, cs.LO

interwhen: A Generalizable Framework for Steering Reasoning Models with Test-time Verification

/ May 14, 2026

arXiv:2602.11202v3 Announce Type: replace-cross
Abstract: Reasoning models produce long traces of intermediate decisions and tool calls, making test-time verification important for ensuring correctness. Existing approaches either verify only the final…

cs.AI, cs.CL, cs.LG, cs.MA

RealICU: Do LLM Agents Understand Long-Context ICU Data? A Benchmark Beyond Behavior Imitation

/ May 14, 2026

arXiv:2605.13542v1 Announce Type: new
Abstract: Intensive care units (ICU) generate long, dense and evolving streams of clinical information, where physicians must repeatedly reassess patient states under time pressure, underscoring a clear need for r…

cs.AI

How to Interpret Agent Behavior

/ May 14, 2026

arXiv:2605.13625v1 Announce Type: new
Abstract: Autonomous agents such as Claude Code and Codex now operate for hours or even days. Understanding their runtime behavior has become critical for downstream tasks such as diagnosing inefficiencies, fixing…