Dissecting Discrete Soft Actor-Critic: Limitations and Principled Alternatives
arXiv:2509.09838v2 Announce Type: replace
Abstract: While Soft Actor-Critic (SAC) is highly effective in continuous control, its discrete counterpart (DSAC) performs poorly on challenging discrete-action domains such as Atari. Consequently, starting from DSAC, we revisit the design of actor-critic methods in this setting. First, we determine that the coupling between the actor and critic entropy is the primary reason behind the poor performance of DSAC. We demonstrate that by merely decoupling these components, DSAC's performance significantly improves. Motivated by this insight, we introduce a flexible off-policy actor-critic framework that subsumes DSAC as a special case and yields novel objectives. Our framework allows using an m-step Bellman operator for the critic update, and instantiates the actor objective by combining standard policy optimization methods with entropy regularization. Theoretically, we prove that the proposed methods can guarantee convergence to the optimal regularized value function in the tabular setting, generalizing the results in prior work. Empirically, we evaluate the proposed objectives on standard Atari games. Our ablations indicate that, unlike DSAC, these objectives, including novel ones, perform robustly even without entropy regularization or explicit exploration mechanisms.