Graph spectral analysis (Fiedler value + Scheffer CSD indicators) predicts grokking 21k steps before loss function – five reproducible experiments [R]

I've been applying the Fiedler value (second-smallest eigenvalue of the weight graph Laplacian) combined with Scheffer critical slowing down indicators to monitor neural network topology during training.

Five experiments, all reproducible on CPU in under 24 hours:

Detection: lambda-2 detects approaching grokking 21,000 steps before test accuracy moves
Classification: grokking and catastrophic forgetting have distinct structural fingerprints (slope 0.00128 vs 0.00471/step)
Steering: structurally-guided intervention preserves 91.7% of knowledge vs 2.6% unsteered
Compounding: three sequential tasks, 100%/100%/97.5% retention, 48x grokking acceleration across tasks
Preemptive curriculum: compatibility scoring ranks task disruption risk correctly, bridging preserves 100% vs 0% direct

Tested on 2-layer MLPs (modular arithmetic) and 1-layer transformer (sequence prediction). Honest limitations section in the paper. These are toy tasks and scaling to production architectures is unvalidated.

The approach comes from complex systems science (Scheffer's early warning indicators for critical transitions) applied to weight graphs rather than ecosystems or financial markets.

Code and paper: https://github.com/EssexRich/neural_si_validation

Happy to discuss the maths, the experimental design, or the limitations.

submitted by /u/RichBenf
[link] [comments]

Leave a Comment