- Provide.ai - Page 34

Beyond Benchmarks: MathArena as an Evaluation Platform for Mathematics with LLMs

/ May 4, 2026

arXiv:2605.00674v1 Announce Type: new
Abstract: Large language models (LLMs) are becoming increasingly capable mathematical collaborators, but static benchmarks are no longer sufficient for evaluating progress: they are often narrow in scope, quickly …

cs.LG

The Power of Order: Fooling LLMs with Adversarial Table Permutations

/ May 4, 2026

arXiv:2605.00445v1 Announce Type: new
Abstract: Large Language Models have achieved remarkable success and are increasingly deployed in critical applications involving tabular data, such as Table Question Answering. However, their robustness to the st…

cs.CL

Learning How and What to Memorize: Cognition-Inspired Two-Stage Optimization for Evolving Memory

/ May 4, 2026

arXiv:2605.00702v1 Announce Type: new
Abstract: Large language model (LLM) agents require long-term user memory for consistent personalization, but limited context windows hinder tracking evolving preferences over long interactions. Existing memory sy…

cs.CL

FinSafetyBench: Evaluating LLM Safety in Real-World Financial Scenarios

/ May 4, 2026

arXiv:2605.00706v1 Announce Type: new
Abstract: Large language models (LLMs) are increasingly applied in financial scenarios. However, they may produce harmful outputs, including facilitating illegal activities or unethical behavior, posing serious co…

cs.LG, cs.NI

TURBOTEST: Learning When Less is Enough through Early Termination of Internet Speed Tests

/ May 4, 2026

arXiv:2510.21141v2 Announce Type: replace-cross
Abstract: Internet speed tests are indispensable for users, ISPs, and policymakers, but their static flooding-based design imposes growing costs: a single high-speed test can transfer hundreds of MB, and…

cond-mat.mtrl-sci, cs.CE, cs.LG

Probabilistic Predictions of Process-Induced Deformation in Carbon/Epoxy Composites Using a Deep Operator Network

/ May 4, 2026

arXiv:2512.13746v5 Announce Type: replace-cross
Abstract: Fiber reinforcement and polymer matrix respond differently to manufacturing conditions due to mismatch in coefficient of thermal expansion and matrix shrinkage during curing of thermosets. Thes…

cs.AI, cs.LG

Scalable Context-Aware Graph Attention for Unsupervised Anomaly Detection in Large-Scale Mobile Networks

/ May 4, 2026

arXiv:2605.00482v1 Announce Type: cross
Abstract: Mobile network operators must monitor thousands of heterogeneous network elements across the radio access network and the packet core, each exposing high-dimensional KPI time series. The scale and cost…

cs.AI, cs.CL

Directed Social Regard: Surfacing Targeted Advocacy, Opposition, Aid, Harms, and Victimization in Online Media

/ May 4, 2026

arXiv:2605.00776v1 Announce Type: cross
Abstract: The language in online platforms, influence operations, and political rhetoric frequently directs a mix of pro-social sentiment (e.g., advocacy, helpfulness, compassion) and anti-social sentiment (e.g….

cs.CL

Reward Modeling from Natural Language Human Feedback

/ May 4, 2026

arXiv:2601.07349v3 Announce Type: replace
Abstract: Reinforcement Learning with Verifiable reward (RLVR) on preference data has become the mainstream approach for training Generative Reward Models (GRMs). Typically in pairwise rewarding tasks, GRMs ge…

cs.CL

BanglaSocialBench: A Benchmark for Evaluating Sociopragmatic and Cultural Alignment of LLMs in Bangladeshi Social Interaction

/ May 4, 2026

arXiv:2603.15949v3 Announce Type: replace
Abstract: Large Language Models have demonstrated strong multilingual fluency, yet fluency alone does not guarantee socially appropriate language use. In high-context languages, communicative competence requir…