Controlling Multimodal Conversational Agents with Coverage-Enhanced Latent Actions
arXiv:2601.07516v2 Announce Type: replace
Abstract: Vision-language models are increasingly employed as multimodal conversational agents (MCAs) for diverse conversational tasks. Recently, reinforcement learning (RL) has been widely explored for adapti…