- Provide.ai - Page 33

Certain Head, Uncertain Tail: Expert-Sample for Test-Time Scaling in Fine-Grained MoE

/ May 4, 2026

arXiv:2602.02443v2 Announce Type: replace
Abstract: Test-time scaling improves LLM performance by generating multiple candidate solutions, yet token-level sampling requires temperature tuning that trades off diversity against stability. Fine-grained M…

cs.AI, cs.LG

D3-Gym: Constructing Real-World Verifiable Environments for Data-Driven Discovery

/ May 4, 2026

arXiv:2604.27977v2 Announce Type: replace
Abstract: Despite recent progress in language models and agents for scientific data-driven discovery, further advancing their capabilities is held back by the absence of verifiable environments representing re…

cs.CL

Beyond Benchmarks: MathArena as an Evaluation Platform for Mathematics with LLMs

/ May 4, 2026

arXiv:2605.00674v1 Announce Type: new
Abstract: Large language models (LLMs) are becoming increasingly capable mathematical collaborators, but static benchmarks are no longer sufficient for evaluating progress: they are often narrow in scope, quickly …

cs.LG

The Power of Order: Fooling LLMs with Adversarial Table Permutations

/ May 4, 2026

arXiv:2605.00445v1 Announce Type: new
Abstract: Large Language Models have achieved remarkable success and are increasingly deployed in critical applications involving tabular data, such as Table Question Answering. However, their robustness to the st…

cs.LG, cs.NI

TURBOTEST: Learning When Less is Enough through Early Termination of Internet Speed Tests

/ May 4, 2026

arXiv:2510.21141v2 Announce Type: replace-cross
Abstract: Internet speed tests are indispensable for users, ISPs, and policymakers, but their static flooding-based design imposes growing costs: a single high-speed test can transfer hundreds of MB, and…

cond-mat.mtrl-sci, cs.CE, cs.LG

Probabilistic Predictions of Process-Induced Deformation in Carbon/Epoxy Composites Using a Deep Operator Network

/ May 4, 2026

arXiv:2512.13746v5 Announce Type: replace-cross
Abstract: Fiber reinforcement and polymer matrix respond differently to manufacturing conditions due to mismatch in coefficient of thermal expansion and matrix shrinkage during curing of thermosets. Thes…

cs.AI, cs.LG

Scalable Context-Aware Graph Attention for Unsupervised Anomaly Detection in Large-Scale Mobile Networks

/ May 4, 2026

arXiv:2605.00482v1 Announce Type: cross
Abstract: Mobile network operators must monitor thousands of heterogeneous network elements across the radio access network and the packet core, each exposing high-dimensional KPI time series. The scale and cost…

cs.CL

Reward Modeling from Natural Language Human Feedback

/ May 4, 2026

arXiv:2601.07349v3 Announce Type: replace
Abstract: Reinforcement Learning with Verifiable reward (RLVR) on preference data has become the mainstream approach for training Generative Reward Models (GRMs). Typically in pairwise rewarding tasks, GRMs ge…

cs.CL

BanglaSocialBench: A Benchmark for Evaluating Sociopragmatic and Cultural Alignment of LLMs in Bangladeshi Social Interaction

/ May 4, 2026

arXiv:2603.15949v3 Announce Type: replace
Abstract: Large Language Models have demonstrated strong multilingual fluency, yet fluency alone does not guarantee socially appropriate language use. In high-context languages, communicative competence requir…

cs.CL

SCOPE:Planning for Hybrid Querying over Clinical Trial Data

/ May 4, 2026

arXiv:2604.25120v2 Announce Type: replace
Abstract: We study clinical trial table reasoning, where answers are not directly stored in visible cells but must be reasoned from semantic understanding through normalization, classification, extraction, or …