Is it possible to train LLMs, through targeted training, to learn to admit “I don’t know”?

Like, using a custom dataset where the labeled response to unanswerable questions is simply “I don’t know” for RL; penalizing the model for hallucinating, and rewarding it for either giving the correct answer or admitting it’s not sure.

I’m not an AI pro, so take my example with a grain of salt. What I’m really wondering is whether the concept in the title is theoretically feasible?

submitted by /u/KonstancjaCarla
[link] [comments]

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top