- Provide.ai - Page 293

SODA: Semi On-Policy Black-Box Distillation for Large Language Models

/ April 24, 2026

arXiv:2604.03873v3 Announce Type: replace-cross
Abstract: Black-box knowledge distillation for large language models presents a strict trade-off. Simple off-policy methods (e.g., sequence-level knowledge distillation) struggle to correct the student’s…

cs.CL, cs.IR

From Past To Path: Masked History Learning for Next-Item Prediction in Generative Recommendation

/ April 24, 2026

arXiv:2509.23649v2 Announce Type: replace-cross
Abstract: Generative recommendation, which directly generates item identifiers, has emerged as a promising paradigm for recommendation systems. However, its potential is fundamentally constrained by the …

cs.CL

Reason Only When Needed: Efficient Generative Reward Modeling via Model-Internal Uncertainty

/ April 24, 2026

arXiv:2604.10072v3 Announce Type: replace
Abstract: Recent advancements in the Generative Reward Model (GRM) have demonstrated its potential to enhance the reasoning abilities of LLMs through Chain-of-Thought (CoT) prompting. Despite these gains, exis…

cs.AI, cs.LG

Retrofit: Continual Learning with Controlled Forgetting for Binary Security Detection and Analysis

/ April 24, 2026

arXiv:2511.11439v2 Announce Type: replace-cross
Abstract: Binary security has increasingly relied on deep learning to reason about malware behavior and program semantics. However, the performance often degrades as threat landscapes evolve and code rep…

cs.CL, cs.LG

Entropy Ratio Clipping as a Soft Global Constraint for Stable Reinforcement Learning

/ April 24, 2026

arXiv:2512.05591v2 Announce Type: replace-cross
Abstract: Large language model post-training relies on reinforcement learning to improve model capability and alignment quality. However, the off-policy training paradigm introduces distribution shift, w…

cs.CL, cs.CR

CI-Work: Benchmarking Contextual Integrity in Enterprise LLM Agents

/ April 24, 2026

arXiv:2604.21308v1 Announce Type: cross
Abstract: Enterprise LLM agents can dramatically improve workplace productivity, but their core capability, retrieving and using internal context to act on a user’s behalf, also creates new risks for sensitive i…

cs.AI, cs.CL, cs.LG

Understanding and Mitigating Spurious Signal Amplification in Test-Time Reinforcement Learning for Math Reasoning

/ April 24, 2026

arXiv:2604.21327v1 Announce Type: cross
Abstract: Test-time reinforcement learning (TTRL) always adapts models at inference time via pseudo-labeling, leaving it vulnerable to spurious optimization signals from label noise. Through an empirical study, …

cs.AI, cs.LG

CAP: Controllable Alignment Prompting for Unlearning in LLMs

/ April 24, 2026

arXiv:2604.21251v1 Announce Type: cross
Abstract: Large language models (LLMs) trained on unfiltered corpora inherently risk retaining sensitive information, necessitating selective knowledge unlearning for regulatory compliance and ethical safety. Ho…

cs.CL, cs.IR

UsefulBench: Towards Decision-Useful Information as a Target for Information Retrieval

/ April 24, 2026

arXiv:2604.15827v2 Announce Type: replace-cross
Abstract: Conventional information retrieval is concerned with identifying the relevance of texts for a given query. Yet, the conventional definition of relevance is dominated by aspects of similarity in…

cs.AI, cs.AR, cs.DC

NPU Design for Diffusion Language Model Inference

/ April 24, 2026

arXiv:2601.20706v2 Announce Type: replace-cross
Abstract: Diffusion-based LLMs (dLLMs) fundamentally depart from traditional autoregressive (AR) LLM inference: they leverage bidirectional attention, block-wise KV cache refreshing, cross-step reuse, an…