Like, using a custom dataset where the labeled response to unanswerable questions is simply “I don’t know” for RL; penalizing the model for hallucinating, and rewarding it for either giving the correct answer or admitting it’s not sure.
I’m not an AI pro, so take my example with a grain of salt. What I’m really wondering is whether the concept in the title is theoretically feasible?
[link] [comments]