I've been applying the Fiedler value (second-smallest eigenvalue of the weight graph Laplacian) combined with Scheffer critical slowing down indicators to monitor neural network topology during training.
Five experiments, all reproducible on CPU in under 24 hours:
- Detection: lambda-2 detects approaching grokking 21,000 steps before test accuracy moves
- Classification: grokking and catastrophic forgetting have distinct structural fingerprints (slope 0.00128 vs 0.00471/step)
- Steering: structurally-guided intervention preserves 91.7% of knowledge vs 2.6% unsteered
- Compounding: three sequential tasks, 100%/100%/97.5% retention, 48x grokking acceleration across tasks
- Preemptive curriculum: compatibility scoring ranks task disruption risk correctly, bridging preserves 100% vs 0% direct
Tested on 2-layer MLPs (modular arithmetic) and 1-layer transformer (sequence prediction). Honest limitations section in the paper. These are toy tasks and scaling to production architectures is unvalidated.
The approach comes from complex systems science (Scheffer's early warning indicators for critical transitions) applied to weight graphs rather than ecosystems or financial markets.
Code and paper: https://github.com/EssexRich/neural_si_validation
Happy to discuss the maths, the experimental design, or the limitations.
[link] [comments]