DeepStage: Learning Autonomous Defense Policies Against Multi-Stage APT Campaigns

arXiv:2603.16969v2 Announce Type: replace-cross Abstract: This paper presents DeepStage, a deep reinforcement learning (DRL) framework for adaptive and stage-aware defense against Advanced Persistent Threats (APTs). The enterprise environment is formulated as a partially observable Markov decision process (POMDP), in which host provenance and network telemetry are fused into unified provenance graphs. Building on our prior work (StageFinder), DeepStage employs a graph neural network encoder and an LSTM-based stage estimator to infer probabilistic attacker stages aligned with the MITRE ATT&CK framework. The resulting stage beliefs, together with graph embeddings, are used to guide a hierarchical Proximal Policy Optimization (PPO) agent that selects defense actions across monitoring, access control, containment, and remediation. Experiments in a realistic enterprise testbed with CALDERA-driven APT playbooks show that DeepStage achieves an average F1-score of 0.887 and a mitigation success rate of 84.7%, outperforming a risk-aware DRL baseline by 21.8% in F1-score and 16.2% in mitigation success. The results demonstrate effective stage-aware and cost-efficient autonomous cyber defense.

Leave a Comment