/u/Academic_Sleep1118

Training a number-aware embedding model + Text JEPA doesn’t work too well + Text auto-encoders have a strange frequency bias [R][P]

/u/Academic_Sleep1118 / May 13, 2026

Hi guys! I've spent 1y trying to predict company growth from the full text of their 10-k filings. It completely failed. But I've had a lot of fun playing with encoder transformers and making them good at numbers (bypassing the tokenizer/predict…

Author name: /u/Academic_Sleep1118

Training a number-aware embedding model + Text JEPA doesn’t work too well + Text auto-encoders have a strange frequency bias [R][P]