Provide.ai - We Provide AI To Companies

PDA: Text-Augmented Defense Framework for Robust Vision-Language Models against Adversarial Image Attacks

Jingning Xu, Haochen Luo, Chen Liu / April 2, 2026

arXiv:2604.01010v1 Announce Type: new
Abstract: Vision-language models (VLMs) are vulnerable to adversarial image perturbations. Existing works based on adversarial training against task-specific adversarial examples are computationally expensive and …

cs.CV

Next-Scale Prediction: A Self-Supervised Approach for Real-World Image Denoising

Yiwen Shan, Haiyu Zhao, Peng Hu, Xi Peng, Yuanbiao Gou / April 2, 2026

arXiv:2512.21038v2 Announce Type: replace
Abstract: Self-supervised real-world image denoising remains a fundamental challenge, arising from the antagonistic trade-off between decorrelating spatially structured noise and preserving high-frequency deta…

cs.AI, cs.LG

The Persistent Vulnerability of Aligned AI Systems

Aengus Lynch / April 2, 2026

arXiv:2604.00324v1 Announce Type: cross
Abstract: Autonomous AI agents are being deployed with filesystem access, email control, and multi-step planning. This thesis contributes to four open problems in AI safety: understanding dangerous internal comp…

cs.AI, cs.CL

DR-LoRA: Dynamic Rank LoRA for Fine-Tuning Mixture-of-Experts Models

Guanzhi Deng, Bo Li, Ronghao Chen, Xiujin Liu, Zhuo Han, Huacan Wang, Lijie Wen, Linqi Song / April 2, 2026

arXiv:2601.04823v4 Announce Type: replace
Abstract: Mixture-of-Experts (MoE) has become a prominent paradigm for scaling Large Language Models (LLMs). Parameter-efficient fine-tuning methods, such as LoRA, are widely adopted to adapt pretrained MoE LL…

cs.CV

OmniEgoCap: Camera-Agnostic Sequence-Level Egocentric Motion Reconstruction

Kyungwon Cho, Hanbyul Joo / April 2, 2026

arXiv:2512.19283v2 Announce Type: replace
Abstract: The proliferation of commercial egocentric devices offers a unique lens into human behavior, yet reconstructing full-body 3D motion remains difficult due to frequent self-occlusion and the ‘out-of-si…

cs.AI, eess.IV

Prompt-Guided Prefiltering for VLM Image Compression

Bardia Azizian, Ivan V. Bajic / April 2, 2026

arXiv:2604.00314v1 Announce Type: cross
Abstract: The rapid progress of large Vision-Language Models (VLMs) has enabled a wide range of applications, such as image understanding and Visual Question Answering (VQA). Query images are often uploaded to t…

cs.CV

CodeDance: A Dynamic Tool-integrated MLLM for Executable Visual Reasoning

/ April 2, 2026

arXiv:2512.17312v2 Announce Type: replace
Abstract: Recent releases such as o3 highlight human-like “thinking with images” reasoning that combines tool use with stepwise verification, yet most open-source approaches still rely on text-only chains, rig…

cs.AI

EHRStruct: A Comprehensive Benchmark Framework for Evaluating Large Language Models on Structured Electronic Health Record Tasks

Xiao Yang, Xuejiao Zhao, Zhiqi Shen / April 2, 2026

arXiv:2511.08206v4 Announce Type: replace
Abstract: Structured Electronic Health Record (EHR) data stores patient information in relational tables and plays a central role in clinical decision-making. Recent advances have explored the use of large lan…

cs.AI, cs.LG

Robust Multimodal Safety via Conditional Decoding

/ April 2, 2026

arXiv:2604.00310v1 Announce Type: cross
Abstract: Multimodal large-language models (MLLMs) often experience degraded safety alignment when harmful queries exploit cross-modal interactions. Models aligned on text alone show a higher rate of successful …

cs.CV

Customizing Large Vision Model-Guided Low-Rank Approximation for Ground-Roll Denoise

Jiacheng Liao, Feng Qian, Ziyin Fan, Yongjian Guo / April 2, 2026

arXiv:2604.00998v1 Announce Type: new
Abstract: Ground-roll is a dominant source of coherent noise in land and vertical seismic profiling (VSP) data, severely masking reflection events and degrading subsequent imaging and interpretation. Conventional …