The Hive Mind is a Single Reinforcement Learning Agent
arXiv:2410.17517v5 Announce Type: replace-cross
Abstract: Decision-making is an essential attribute of any intelligent agent or group. Natural systems are known to converge to effective strategies through at least two distinct mechanisms: collective decision-making via imitation of others, and trial-and-error by a single agent. This paper establishes an equivalence between these two paradigms by drawing from the well-studied collective decision-making problem of nest-hunting in swarms of honey bees. We show that the emergent distributed cognition (sometimes referred to as the $\textit{hive mind}$) arising from individuals following simple, local imitation-based rules is that of a single online reinforcement learning (RL) agent interacting with many parallel environments. More specifically, in the purely imitative $\textit{weighted voter}$ model of bees' waggle dance, the update rule through which this macro-agent learns is a multi-armed bandit algorithm that we coin $\textit{Maynard-Cross Learning}$. Our analysis implies that a group of purely imitative organisms can be equivalent to a more complex, reinforcement-enabled entity, substantiating the idea that group-level intelligence may explain how seemingly simple and blind individual behaviors are selected in nature. Beyond biology, the framework offers new tools for analyzing economic and social systems where individuals imitate successful strategies, effectively participating in a collective learning process. Our findings may further inform the design of scalable RL-inspired collective systems in artificial domains.