You Only Judge Once: Multi-response Reward Modeling in a Single Forward Pass
arXiv:2604.10966v2 Announce Type: replace
Abstract: We present a discriminative multimodal reward model that scores all candidate responses in a single forward pass. Conventional discriminative reward models evaluate each response independently, requi…