- Provide.ai - Page 42

HiL-Bench (Human-in-Loop Benchmark): Do Agents Know When to Ask for Help?

/ May 1, 2026

arXiv:2604.09408v3 Announce Type: replace
Abstract: Frontier coding agents solve complex tasks when given complete context but collapse when specifications are incomplete or ambiguous. The bottleneck is not raw capability, but judgment: knowing when t…

cs.AI, cs.MA

AblateCell: A Reproduce-then-Ablate Agent for Virtual Cell Repositories

/ May 1, 2026

arXiv:2604.19606v2 Announce Type: replace
Abstract: Systematic ablations are essential to attribute performance gains in AI Virtual Cells, yet they are rarely performed because biological repositories are under-standardized and tightly coupled to doma…

cs.AI, cs.CL

From Test-taking to Cognitive Scaffolding: A Pedagogical Diagnostic Benchmark for LLMs on English Standardized Tests

/ May 1, 2026

arXiv:2505.17056v2 Announce Type: replace-cross
Abstract: As large language models (LLMs) are increasingly integrated into educational tools, current evaluations on standardized tests predominantly focus on binary outcome accuracy. Instead, an effecti…

cs.AI, cs.IR

A Gated Hybrid Contrastive Collaborative Filtering Recommendation

/ May 1, 2026

arXiv:2604.27117v1 Announce Type: cross
Abstract: Recommender systems increasingly incorporate textual reviews to enrich user and item representations. However, most review-aware models remain optimized for rating prediction rather than ranking qualit…

cs.AI, cs.CL

PiCSAR: Probabilistic Confidence Selection And Ranking for Reasoning Chains

/ May 1, 2026

arXiv:2508.21787v2 Announce Type: replace-cross
Abstract: Best-of-n sampling improves the accuracy of large language models (LLMs) and large reasoning models (LRMs) by generating multiple candidate solutions and selecting the one with the highest rewa…

cs.AI, cs.CL

When Personalization Tricks Detectors: The Feature-Inversion Trap in Machine-Generated Text Detection

/ May 1, 2026

arXiv:2510.12476v3 Announce Type: replace-cross
Abstract: Large language models (LLMs) have grown more powerful in language generation, producing fluent text and even imitating personal style. Yet, this ability also heightens the risk of identity impe…

cs.AI, cs.CV

Focal Modulation and Bidirectional Feature Fusion Network for Medical Image Segmentation

/ May 1, 2026

arXiv:2510.20933v2 Announce Type: replace-cross
Abstract: Medical image segmentation is essential for clinical applications such as disease diagnosis, treatment planning, and disease development monitoring because it provides precise morphological and…

cs.CL

Exploring the Limits of Pruning: Task-Specific Neurons, Model Collapse, and Recovery in Task-Specific Large Language Models

/ May 1, 2026

arXiv:2604.27115v1 Announce Type: new
Abstract: Neuron pruning is widely used to reduce the computational cost and parameter footprint of large language models, yet it remains unclear whether neurons in task-specific models contribute uniformly to tas…

cs.CL

CL-bench Life: Can Language Models Learn from Real-Life Context?

/ May 1, 2026

arXiv:2604.27043v1 Announce Type: new
Abstract: Today’s AI assistants such as OpenClaw are designed to handle context effectively, making context learning an increasingly important capability for models. As these systems move beyond professional setti…

cs.CL

Targeted Linguistic Analysis of Sign Language Models with Minimal Translation Pairs

/ May 1, 2026

arXiv:2604.27232v1 Announce Type: new
Abstract: Models of sign language have historically lagged behind those for spoken language (text and speech). Recent work has greatly improved their performance on tasks like sign language translation and isolate…