The hidden risks of temporal resampling in clinical reinforcement learning

arXiv:2602.06603v3 Announce Type: replace Abstract: Reinforcement learning (RL) is a type of artificial intelligence for making optimal choices. In healthcare, researchers generally use offline RL (ORL), where models are trained and evaluated from retrospective observational data. To accommodate inherently irregular clinical records, researchers often resample the data into uniform time intervals before training (known as binning). However, discretised data presents the model with a fictional representation of clinical scenarios, especially where unpredictable decision timings are common. As these models lack robust trial evidence, we chose to explore the effects of this further by conducting an in silico clinical trial using 30 virtual patients with type 1 diabetes from the FDA-approved UVA/Padova simulator. The simulator was modified to include stochastic intervals between decisions and used to generate a training dataset for offline RL. We trained three ORL algorithms on both the unprocessed dataset and equivalent datasets resampled at 10-minute, 2-hour, and 4-hour intervals. When deployed back into the simulated environment, temporal resampling was found to reduce model performance by up to 60% relative to unprocessed data, with 4-hour binning causing all agents to perform worse than the dataset's baseline. Retrospective evaluation on resampled data actively obscured this effect, predicting 1.5-3x better returns than agents achieved in practice. We recommend that future research in this area prioritises datasets with natural clinical timings between decisions, which may be a necessary step before these models can be safely deployed into patient care.

Leave a Comment