Precise Shield: Explaining and Aligning VLLM Safety via Neuron-Level Guidance
arXiv:2604.08881v1 Announce Type: new
Abstract: In real-world deployments, Vision-Language Large Models (VLLMs) face critical challenges from multilingual and multimodal composite attacks: harmful images paired with low-resource language texts can eas…