- Provide.ai - Page 338

GeoAgentBench: A Dynamic Execution Benchmark for Tool-Augmented Agents in Spatial Analysis

/ April 17, 2026

arXiv:2604.13888v1 Announce Type: new
Abstract: The integration of Large Language Models (LLMs) into Geographic Information Systems (GIS) marks a paradigm shift toward autonomous spatial analysis. However, evaluating these LLM-based agents remains cha…

cs.AI, cs.CR

SafeHarness: Lifecycle-Integrated Security Architecture for LLM-based Agent Deployment

/ April 17, 2026

arXiv:2604.13630v1 Announce Type: cross
Abstract: The performance of large language model (LLM) agents depends critically on the execution harness, the system layer that orchestrates tool use, context management, and state persistence. Yet this same a…

cs.AI

AI-Assisted Peer Review at Scale: The AAAI-26 AI Review Pilot

/ April 17, 2026

arXiv:2604.13940v1 Announce Type: new
Abstract: Scientific peer review faces mounting strain as submission volumes surge, making it increasingly difficult to sustain review quality, consistency, and timeliness. Recent advances in AI have led the commu…

cs.AI, cs.CL

ReviewGrounder: Improving Review Substantiveness with Rubric-Guided, Tool-Integrated Agents

/ April 17, 2026

arXiv:2604.14261v1 Announce Type: new
Abstract: The rapid rise in AI conference submissions has driven increasing exploration of large language models (LLMs) for peer review support. However, LLM-based reviewers often generate superficial, formulaic c…

cs.CV

Integrating Object Detection, LiDAR-Enhanced Depth Estimation, and Segmentation Models for Railway Environments

/ April 17, 2026

arXiv:2604.14781v1 Announce Type: new
Abstract: Obstacle detection in railway environments is crucial for ensuring safety. However, very few studies address the problem using a complete, modular, and flexible system that can both detect objects in the…

cs.AI, cs.CL

EuropeMedQA Study Protocol: A Multilingual, Multimodal Medical Examination Dataset for Language Model Evaluation

/ April 17, 2026

arXiv:2604.14306v1 Announce Type: new
Abstract: While Large Language Models (LLMs) have demonstrated high proficiency on English-centric medical examinations, their performance often declines when faced with non-English languages and multimodal diagno…

cs.CV

HY-World 2.0: A Multi-Modal World Model for Reconstructing, Generating, and Simulating 3D Worlds

/ April 17, 2026

arXiv:2604.14268v1 Announce Type: new
Abstract: We introduce HY-World 2.0, a multi-modal world model framework that advances our prior project HY-World 1.0. HY-World 2.0 accommodates diverse input modalities, including text prompts, single-view images…

cs.CV

From Boundaries to Semantics: Prompt-Guided Multi-Task Learning for Petrographic Thin-section Segmentation

/ April 17, 2026

arXiv:2604.14805v1 Announce Type: new
Abstract: Grain-edge segmentation (GES) and lithology semantic segmentation (LSS) are two pivotal tasks for quantifying rock fabric and composition. However, these two tasks are often treated separately, and the s…

cs.CV, cs.HC, cs.MM

NTIRE 2026 Challenge on Video Saliency Prediction: Methods and Results

/ April 17, 2026

arXiv:2604.14816v1 Announce Type: new
Abstract: This paper presents an overview of the NTIRE 2026 Challenge on Video Saliency Prediction. The goal of the challenge participants was to develop automatic saliency map prediction methods for the provided …

cs.AI

MAS-Bench: A Unified Benchmark for Shortcut-Augmented Hybrid Mobile GUI Agents

/ April 17, 2026

arXiv:2509.06477v2 Announce Type: replace
Abstract: Shortcuts such as APIs and deep-links have emerged as efficient complements to flexible GUI operations, fostering a promising hybrid paradigm for MLLM-based mobile automation. However, systematic eva…