Hey everyone. I’m an 18yo indie dev, and I’ve been experimenting with Spiking Neural Networks (SNNs) for language modeling. A lot of papers (like SpikeBERT) mention that training 1B+ SNNs directly from random initialization fails due to vanishing gradients, so people usually do ANN-to-SNN conversion or distillation. I wanted to see if I could force it to converge purely in the spike domain. I had to stop at 27k steps because my wallet is literally empty lol, but the loss converged to 4.4.
Here are the most interesting things that happened:
- Massive Sparsity: It maintains ~93% sparsity. Only about 7% of neurons fire per token. It's incredibly cheap on memory during inference compared to dense models.
- Cross-lingual emergence: Around step 25K, it randomly started generating structurally correct Russian text, even though it wasn't explicitly targeted/weighted for it in the dataset mix.
- Memory routing shift: As I scaled the architecture past 600M to 1B, the model spontaneously shifted 39% of its activation routing into the persistent memory module. It basically learned on its own that memory is more valuable at a larger scale.
Limitations (Being honest):
The text generation is still janky and nowhere near GPT-2 fluency yet. The loss (4.4) is high, mostly because I couldn't train it longer. But proving that a 1B pure SNN can converge from random init feels like a solid milestone.
I'm sharing this because I'd love some harsh technical feedback.
- Does anyone here have experience with neuromorphic hardware? Would an architecture like this map well to Loihi?
- If anyone has tips on pushing SNN loss lower or stabilizing surrogate gradients further, I'm all ears.
The code, architecture details, and the 12GB full training checkpoint (weights + optimizer states) are on my GitHub:https://github.com/gtausa197-svg/-Project-Nord-Spiking-Neural-Network-Language-Model.git
[link] [comments]