Efficient Process Reward Modeling via Contrastive Mutual Information
arXiv:2604.10660v1 Announce Type: new
Abstract: Recent research has devoted considerable effort to verifying the intermediate reasoning steps of chain-of-thought (CoT) trajectories using process reward models (PRMs) and other verifier models. However,…