ChatGPT

Is it possible to train LLMs, through targeted training, to learn to admit “I don’t know”?

Like, using a custom dataset where the labeled response to unanswerable questions is simply “I don’t know” for RL; penalizing the model for hallucinating, and rewarding it for either giving the correct answer or admitting it’s not sure. I’m not an AI p…