Provide.ai - We Provide AI To Companies

Object Affordance Recognition and Grounding via Multi-scale Cross-modal Representation Learning

Xinhang Wan, Dongqiang Gou, Xinwang Liu, En Zhu, Xuming He / April 2, 2026

arXiv:2508.01184v2 Announce Type: replace
Abstract: A core problem of Embodied AI is to learn object manipulation from observation, as humans do. To achieve this, it is important to localize 3D object affordance areas through observation such as image…

cs.AI, cs.CL, cs.SE

Revision or Re-Solving? Decomposing Second-Pass Gains in Multi-LLM Pipelines

Jingjie Ning, Xueqi Li, Chengyu Yu / April 2, 2026

arXiv:2604.01029v1 Announce Type: cross
Abstract: Multi-LLM revision pipelines, in which a second model reviews and improves a draft produced by a first, are widely assumed to derive their gains from genuine error correction. We question this assumpti…

cs.AI, cs.CL

Dynin-Omni: Omnimodal Unified Large Diffusion Language Model

/ April 2, 2026

arXiv:2604.00007v1 Announce Type: new
Abstract: We present Dynin-Omni, the first masked-diffusion-based omnimodal foundation model that unifies text, image, and speech understanding and generation, together with video understanding, within a single ar…

cs.CV

FreqPhys: Repurposing Implicit Physiological Frequency Prior for Robust Remote Photoplethysmography

Wei Qian, Dan Guo, Jinxing Zhou, Bochao Zou, Zitong Yu, Meng Wang / April 2, 2026

arXiv:2604.00534v1 Announce Type: new
Abstract: Remote photoplethysmography (rPPG) enables contactless physiological monitoring by capturing subtle skin-color variations from facial videos. However, most existing methods predominantly rely on time-dom…

cs.CV

HUMOF: Human Motion Forecasting in Interactive Social Scenes

Caiyi Sun, Yujing Sun, Xiao Han, Zemin Yang, Jiawei Liu, Xinge Zhu, Siu Ming Yiu, Yuexin Ma / April 2, 2026

arXiv:2506.03753v3 Announce Type: replace
Abstract: Complex scenes present significant challenges for predicting human behaviour due to the abundance of interaction information, such as human-human and humanenvironment interactions. These factors comp…

cs.AI, cs.CV, cs.LG

Query-Conditioned Evidential Keyframe Sampling for MLLM-Based Long-Form Video Understanding

Yiheng Wang, Lichen Zhu, Yueqian Lin, Yudong Liu, Jingyang Zhang, Hai "Helen" Li, Yiran Chen / April 2, 2026

arXiv:2604.01002v1 Announce Type: cross
Abstract: Multimodal Large Language Models (MLLMs) have shown strong performance on video question answering, but their application to long-form videos is constrained by limited context length and computational …

cs.CV

AceTone: Bridging Words and Colors for Conditional Image Grading

Tianren Ma, Mingxiang Liao, Xijin Zhang, Qixiang Ye / April 2, 2026

arXiv:2604.00530v1 Announce Type: new
Abstract: Color affects how we interpret image style and emotion. Previous color grading methods rely on patch-wise recoloring or fixed filter banks, struggling to generalize across creative intents or align with …

cs.AI, cs.CL

LinearARD: Linear-Memory Attention Distillation for RoPE Restoration

Ning Yang, Hengyu Zhong, Wentao Wang, Baoliang Tian, Haijun Zhang, Jun Wang / April 2, 2026

arXiv:2604.00004v1 Announce Type: new
Abstract: The extension of context windows in Large Language Models is typically facilitated by scaling positional encodings followed by lightweight Continual Pre-Training (CPT). While effective for processing lon…

cs.CV

Towards Online Multi-Modal Social Interaction Understanding

Xinpeng Li, Shijian Deng, Bolin Lai, Weiguo Pian, James M. Rehg, Yapeng Tian / April 2, 2026

arXiv:2503.19851v2 Announce Type: replace
Abstract: In this paper, we introduce a new problem, Online-MMSI, where the model must perform multimodal social interaction understanding (MMSI) using only historical information. Given a recorded video and a…

cs.AI, cs.CL, cs.CR, cs.LG

Do Phone-Use Agents Respect Your Privacy?

/ April 2, 2026

arXiv:2604.00986v1 Announce Type: cross
Abstract: We study whether phone-use agents respect privacy while completing benign mobile tasks. This question has remained hard to answer because privacy-compliant behavior is not operationalized for phone-use…