I trained a 90M parameter embedding model from scratch
I trained a 90M parameter encoder only (embedding) model from scratch. I mostly trained in on google colab on a colab pro plus subscription. this was like the 5th run as previously I had issues with exploding gradients. It was a fun project but not ye…