Is it possible to train LLMs, through targeted training, to learn to admit “I don’t know”?

By /u/KonstancjaCarla / April 19, 2026

Like, using a custom dataset where the labeled response to unanswerable questions is simply “I don’t know” for RL; penalizing the model for hallucinating, and rewarding it for either giving the correct answer or admitting it’s not sure.

I’m not an AI pro, so take my example with a grain of salt. What I’m really wondering is whether the concept in the title is theoretically feasible?

submitted by /u/KonstancjaCarla
[link] [comments]

Leave a Comment