Towards Agentic Runtime Healing
arXiv:2408.01055v2 Announce Type: replace-cross
Abstract: Self-healing systems have long been a focus of research, aiming to enable software to recover from unexpected runtime errors without human intervention. Traditional approaches rely on predefined heuristic rules, such as reusing error handlers or rolling back to checkpoints, but these methods struggle to adapt to the diverse range of runtime errors. The emergence of Large Language Models offers a new opportunity to address this challenge. Leveraging their ability to understand and generate code and natural language, we propose using LLMs to dynamically generate error-handling strategies in real time, tailored to specific runtime contexts such as error messages and program states. We demonstrate the feasibility of this approach by designing such a framework, Healer, and empirically showing that it can handle runtime errors with a high success rate. When an unanticipated runtime error occurs, Healer leverages its internal LLM to generate bespoke error-handling code. The generated healing code is then executed to produce a corrected program state, allowing the program to continue execution with minimal disruption. We evaluate Healer across four code datasets and three state-of-the-art LLMs (GPT-3.5, GPT-4, and CodeQwen-7B), where GPT-4 can successfully recover from 72.8% of runtime errors, underscoring the promise of LLMs in this domain. Despite these promising results, challenges remain, particularly regarding the trustworthiness of LLM-generated code and its integration into existing systems. We mention potential solutions, such as safety checks and Healer-aware programming, to mitigate risks and ensure reliable operation. This work represents the first step toward agentic runtime healing, paving the way for more adaptive, resilient, and self-healing software systems.