- Provide.ai - Page 427

Multitasking Embedding for Embryo Blastocyst Grading Prediction (MEmEBG)

/ April 16, 2026

arXiv:2604.13217v1 Announce Type: cross
Abstract: Reliable evaluation of blastocyst quality is critical for the success of in vitro fertilization (IVF) treatments. Current embryo grading practices primarily rely on visual assessment of morphological f…

cs.CV

The Second Challenge on Real-World Face Restoration at NTIRE 2026: Methods and Results

/ April 16, 2026

arXiv:2604.10532v2 Announce Type: replace
Abstract: This paper provides a review of the NTIRE 2026 challenge on real-world face restoration, highlighting the proposed solutions and the resulting outcomes. The challenge focuses on generating natural an…

cs.AI, cs.CV, cs.GR

Feed-Forward 3D Scene Modeling: A Problem-Driven Perspective

/ April 16, 2026

arXiv:2604.14025v1 Announce Type: cross
Abstract: Reconstructing 3D representations from 2D inputs is a fundamental task in computer vision and graphics, serving as a cornerstone for understanding and interacting with the physical world. While traditi…

cs.CV

CoD-Lite: Real-Time Diffusion-Based Generative Image Compression

/ April 16, 2026

arXiv:2604.12525v2 Announce Type: replace
Abstract: Recent advanced diffusion methods typically derive strong generative priors by scaling diffusion transformers. However, scaling fails to generalize when adapted for real-time compression scenarios th…

cs.CV

Seek-and-Solve: Benchmarking MLLMs for Visual Clue-Driven Reasoning in Daily Scenarios

/ April 16, 2026

arXiv:2604.14041v1 Announce Type: new
Abstract: Daily scenarios are characterized by visual richness, requiring Multimodal Large Language Models (MLLMs) to filter noise and identify decisive visual clues for accurate reasoning. Yet, current benchmarks…

cs.CV

What to Say and When to Say it: Live Fitness Coaching as a Testbed for Situated Interaction

/ April 16, 2026

arXiv:2407.08101v5 Announce Type: replace
Abstract: Vision-language models have shown impressive progress in recent years. However, existing models are largely limited to turn-based interactions, where each turn must be stepped (i.e., prompted) by the…

cs.AI, cs.CV

Can Cross-Layer Transcoders Replace Vision Transformer Activations? An Interpretable Perspective on Vision

/ April 16, 2026

arXiv:2604.13304v1 Announce Type: cross
Abstract: Understanding the internal activations of Vision Transformers (ViTs) is critical for building interpretable and trustworthy models. While Sparse Autoencoders (SAEs) have been used to extract human-inte…

cs.CV

Bias at the End of the Score

/ April 16, 2026

arXiv:2604.13305v1 Announce Type: new
Abstract: Reward models (RMs) are inherently non-neutral value functions designed and trained to encode specific objectives, such as human preferences or text-image alignment. RMs have become crucial components of…

cs.CV

Right Regions, Wrong Labels: Semantic Label Flips in Segmentation under Correlation Shift

/ April 16, 2026

arXiv:2604.13326v1 Announce Type: new
Abstract: The robustness of machine learning models can be compromised by spurious correlations between non-causal features in the input data and target labels. A common way to test for such correlations is to tra…

cs.AI, cs.CV

FieldWorkArena: Agentic AI Benchmark for Real Field Work Tasks

/ April 16, 2026

arXiv:2505.19662v3 Announce Type: replace
Abstract: This paper introduces FieldWorkArena, a benchmark for agentic AI targeting real-world field work. With the recent increase in demand for agentic AI, they are built to detect and document safety hazar…