RAG-GNN: Integrating Retrieved Knowledge with Graph Neural Networks for Precision Medicine
arXiv:2602.00586v2 Announce Type: replace-cross
Abstract: Network topology excels at structural predictions but fails to capture functional semantics encoded in biomedical literature. We present RAG-GNN, an end-to-end trainable retrieval-augmented graph neural network framework that integrates GNN representations with dynamically retrieved literature-derived knowledge through a jointly optimized retrieval projection, gated fusion mechanism, and contrastive alignment. In a cancer signaling case study (379 proteins, 3,498 interactions, 14 functional categories), RAG-GNN improves functional clustering from silhouette $= -0.237 \pm 0.065$ (GNN-only) to $-0.144 \pm 0.066$, a consistent improvement of $+0.093 \pm 0.022$ across 10 random seeds, while the learned retrieval achieves mean precision@10 $= 0.242$, a 152\% improvement over the random baseline ($0.096$). Heuristic information decomposition with bootstrap confidence intervals reveals that topology and retrieval encode overwhelmingly shared information (95.6\%), with retrieval improving both intra-cluster cohesion (silhouette) and cluster agreement (ARI $+0.021 \pm 0.015$). Counterfactual experiments confirm that adversarial, absent, and random retrieval all degrade performance, validating that the gated fusion mechanism depends on document content. Benchmarking against eight established embedding methods demonstrates task-specific complementarity: topology-focused methods achieve strong link prediction, while retrieval augmentation consistently improves functional clustering within the controlled GNN-only ablation. DDR1 subnetwork analysis provides confirmatory validation consistent with established synthetic lethality relationships. These results establish that topology-only and retrieval-augmented approaches serve complementary purposes for precision medicine applications.