Reinforcement Learning for Intensity Control: An Application to Choice-Based Network Revenue Management
arXiv:2406.05358v3 Announce Type: replace
Abstract: Intensity control is a class of continuous-time dynamic optimization problems with many important applications in Operations Research including queueing and revenue management. In this study, we propose a practical continuous-time reinforcement learning framework for intensity control using choice-based network revenue management as a case study, which is a classical problem in revenue management that features a large state space, a large action space, and a continuous time horizon. We show that by leveraging the event-driven structure of the problem and the inherent discretization of sample paths created by the state-jump times, a defining feature of intensity control, one does not need to discretize the time horizon in advance. We adapt discrete-time Monte Carlo and temporal difference learning algorithms for policy evaluation to continuous time and develop policy-gradient-based actor-critic algorithms for event-driven intensity control. Through a comprehensive numerical study, we evaluate the proposed approach against various state-of-the-art benchmarks, demonstrating its overall superior performance and effective scalability to large-scale problems. Notably, compared to discretization-based reinforcement learning methods, our continuous-time approach delivers significantly superior performance while maintaining comparable computational efficiency. This advantage is particularly pronounced in highly non-stationary environments.