Lightweight Prompt-Guided CLIP Adaptation for Monocular Depth Estimation
arXiv:2604.01118v1 Announce Type: cross
Abstract: Leveraging the rich semantic features of vision-language models (VLMs) like CLIP for monocular depth estimation tasks is a promising direction, yet often requires extensive fine-tuning or lacks geometr…