- Provide.ai - Page 59

Common Corpus: The Largest Collection of Ethical Data for LLM Pre-Training

/ April 21, 2026

arXiv:2506.01732v2 Announce Type: replace
Abstract: Large Language Models (LLMs) are pre-trained on large data from different sources and domains. These datasets often contain trillions of tokens, including large portions of copyrighted or proprietary…

cs.CL

BenchMarker: An Education-Inspired Toolkit for Highlighting Flaws in Multiple-Choice Benchmarks

/ April 21, 2026

arXiv:2602.06221v2 Announce Type: replace
Abstract: Multiple-choice question answering (MCQA) is standard in NLP, but benchmarks lack rigorous quality control. We present BenchMarker, an education-inspired toolkit using LLM judges to flag three common…

cs.CV, cs.LG

Vision Language Models are Biased

/ April 21, 2026

arXiv:2505.23941v4 Announce Type: replace
Abstract: Large language models (LLMs) memorize a vast amount of prior knowledge from the Internet that helps them on downstream tasks but also may notoriously sway their outputs towards wrong or biased answer…

cs.AI, cs.RO

Sensorimotor Self-Recognition in Multimodal Large Language Model-Driven Robots

/ April 21, 2026

arXiv:2505.19237v2 Announce Type: replace-cross
Abstract: Self-recognition — the ability to maintain an internal representation of one’s own body within the environment — underpins intelligent, autonomous behavior. As a foundational component of the…

cs.CL

CBRS: Cognitive Blood Request System with Bilingual Dataset and Dual-Layer Filtering for Multi-Platform Social Streams

/ April 21, 2026

arXiv:2604.16665v1 Announce Type: new
Abstract: Urgent blood donation seeking posts and messages on social media often go unnoticed due to the overwhelming volume of daily communications. Traditional app-based systems, reliant on manual input, struggl…

cs.AI, cs.CL, cs.DB

PersonalHomeBench: Evaluating Agents in Personalized Smart Homes

/ April 21, 2026

arXiv:2604.16813v1 Announce Type: cross
Abstract: Agentic AI systems are rapidly advancing toward real-world applications, yet their readiness in complex and personalized environments remains insufficiently characterized. To address this gap, we intro…

cs.CV

NTIRE 2026 Challenge on Single Image Reflection Removal in the Wild: Datasets, Results, and Methods

/ April 21, 2026

arXiv:2604.10321v2 Announce Type: replace
Abstract: In this paper, we review the NTIRE 2026 challenge on single-image reflection removal (SIRR) in the wild. SIRR is a fundamental task in image restoration. Despite progress in academic research, most m…

cs.AI, cs.CL

CreditDecoding: Accelerating Parallel Decoding in Diffusion Large Language Models with Trace Credit

/ April 21, 2026

arXiv:2510.06133v2 Announce Type: replace
Abstract: Diffusion large language models (dLLMs) generate text through iterative denoising. In commonly adopted parallel decoding schemes, each step confirms only high-confidence positions while remasking the…

cs.AI, cs.CL

WeatherArchive-Bench: Benchmarking Retrieval-Augmented Reasoning for Historical Weather Archives

/ April 21, 2026

arXiv:2510.05336v2 Announce Type: replace
Abstract: Historical archives on weather events are collections of enduring primary source records that offer rich, untapped narratives of how societies have experienced and responded to extreme weather events…

cs.CL, cs.HC

OPeRA: A Dataset of Observation, Persona, Rationale, and Action for Evaluating LLMs on Human Online Shopping Behavior Simulation

/ April 21, 2026

arXiv:2506.05606v5 Announce Type: replace
Abstract: Can large language models (LLMs) accurately simulate the next web action of a specific user? While LLMs have shown promising capabilities in generating “believable” human behaviors, evaluating thei…