cs.LG, stat.ML

Sharp feature-learning transitions and Bayes-optimal neural scaling laws in extensive-width networks

arXiv:2605.10395v1 Announce Type: new
Abstract: We study the information-theoretic limits of learning a one-hidden-layer teacher network with hierarchical features from noisy queries, in the context of knowledge transfer to a smaller student model. We…