Knowing When to Defer: Selective Prediction for Responsible Knowledge Tracing

arXiv:2509.21514v3 Announce Type: replace-cross Abstract: Research on Knowledge Tracing (KT) models traditionally focuses on improving predictive accuracy. However, responsible real-world deployment requires models to know when to defer uncertain predictions to a human teacher. We introduce an intrinsic selective prediction layer for existing KT models using Monte Carlo Dropout (MC-Dropout) to quantify uncertainty. We evaluate this approach across three architectures (DKT, SAKT, and AKT) using the Eedi mathematics dataset. Abstaining on the 20\% most uncertain predictions lifts accuracy by 2.3 to 3.0 percentage points, AUC by 1.9 to 2.4 percentage points and F1 by 1.4 to 4.3 percentage points without any retraining. This abstention strategy is highly targeted: the deferred set exhibits 1.45 to 1.60 times the error rate of the kept set. Furthermore, this targeting holds within every question-difficulty quartile and remains fair across student-ability levels. Importantly, MC-Dropout variance gives roughly five times the AUC lift of a calibrated two-parameter logistic (2PL) Item Response Theory (IRT) baseline as a selective-prediction signal. A variance decomposition of the model's epistemic uncertainty (BALD) reveals that the entire classical psychometric stack, comprising question difficulty, student ability, IRT-style outcome ambiguity, and historical curriculum coverage, explains less than 4\% of the signal under linear modeling and at most 23\% even with a non-linear regressor. This leaves 77\% to 90\% as architecture-specific epistemic content that MC-Dropout surfaces and simpler proxies cannot recover. Selective prediction with model-native epistemic uncertainty is therefore a necessary component of responsible KT deployment, complementary to subgroup-fairness audits and downstream classroom evaluation rather than a substitute for them.

Leave a Comment