- Provide.ai - Page 120

Why Supervised Fine-Tuning Fails to Learn: A Systematic Study of Incomplete Learning in Large Language Models

/ April 24, 2026

arXiv:2604.10079v3 Announce Type: replace
Abstract: Supervised Fine-Tuning (SFT) is the standard approach for adapting large language models (LLMs) to downstream tasks. However, we observe a persistent failure mode: even after convergence, models ofte…

cs.CL

XtraGPT: Context-Aware and Controllable Academic Paper Revision via Human-AI Collaboration

/ April 24, 2026

arXiv:2505.11336v4 Announce Type: replace
Abstract: Despite the growing adoption of large language models (LLMs) in academic workflows, their capabilities remain limited in supporting high-quality scientific writing. Most existing systems are designed…

cs.CL, cs.LG

SODA: Semi On-Policy Black-Box Distillation for Large Language Models

/ April 24, 2026

arXiv:2604.03873v3 Announce Type: replace-cross
Abstract: Black-box knowledge distillation for large language models presents a strict trade-off. Simple off-policy methods (e.g., sequence-level knowledge distillation) struggle to correct the student’s…

cs.CL, cs.IR

From Past To Path: Masked History Learning for Next-Item Prediction in Generative Recommendation

/ April 24, 2026

arXiv:2509.23649v2 Announce Type: replace-cross
Abstract: Generative recommendation, which directly generates item identifiers, has emerged as a promising paradigm for recommendation systems. However, its potential is fundamentally constrained by the …

cs.CL

Reason Only When Needed: Efficient Generative Reward Modeling via Model-Internal Uncertainty

/ April 24, 2026

arXiv:2604.10072v3 Announce Type: replace
Abstract: Recent advancements in the Generative Reward Model (GRM) have demonstrated its potential to enhance the reasoning abilities of LLMs through Chain-of-Thought (CoT) prompting. Despite these gains, exis…

cs.CL, cs.LG

Entropy Ratio Clipping as a Soft Global Constraint for Stable Reinforcement Learning

/ April 24, 2026

arXiv:2512.05591v2 Announce Type: replace-cross
Abstract: Large language model post-training relies on reinforcement learning to improve model capability and alignment quality. However, the off-policy training paradigm introduces distribution shift, w…

cs.CL, cs.CR

CI-Work: Benchmarking Contextual Integrity in Enterprise LLM Agents

/ April 24, 2026

arXiv:2604.21308v1 Announce Type: cross
Abstract: Enterprise LLM agents can dramatically improve workplace productivity, but their core capability, retrieving and using internal context to act on a user’s behalf, also creates new risks for sensitive i…

cs.CL

Hierarchical Policy Optimization for Simultaneous Translation of Unbounded Speech

/ April 24, 2026

arXiv:2604.21045v1 Announce Type: new
Abstract: Simultaneous speech translation (SST) generates translations while receiving partial speech input. Recent advances show that large language models (LLMs) can substantially improve SST quality, but at the…

cs.AI, cs.CL, cs.LG

Understanding and Mitigating Spurious Signal Amplification in Test-Time Reinforcement Learning for Math Reasoning

/ April 24, 2026

arXiv:2604.21327v1 Announce Type: cross
Abstract: Test-time reinforcement learning (TTRL) always adapts models at inference time via pseudo-labeling, leaving it vulnerable to spurious optimization signals from label noise. Through an empirical study, …

cs.CL, cs.IR

UsefulBench: Towards Decision-Useful Information as a Target for Information Retrieval

/ April 24, 2026

arXiv:2604.15827v2 Announce Type: replace-cross
Abstract: Conventional information retrieval is concerned with identifying the relevance of texts for a given query. Yet, the conventional definition of relevance is dominated by aspects of similarity in…