cs.LG

Predicting Large Model Test Losses with a Noisy Quadratic System

arXiv:2605.09154v1 Announce Type: new
Abstract: We introduce a predictive model that estimates the pre-training loss of large models from model size (N), batch size (B) and number of weight updates (K). This is the first loss prediction model that can…