h/t Eric Michaud for sharing his paper with me.
There’s a tradition of high-impact ML papers using short, punchy categorical sentences as their titles: Understanding Deep Learning Requires Rethinking Generalization, Attention is All You Need, Language Models Are Few Shot Learners, and so forth.
A new paper by Simon et al. seeks to expand on this tradition with not a present claim but a future tense, prophetic future sentence: “There Will Be a Scientific Theory of Deep Learning”.
There’s a lot of pessimism toward deep learning theory basically everywhere: the people building the AIs are pretty pessimistic, the academic AI researchers are as a general rule pessimistic (even people who used to do theory!), and with the exception of maybe 3-4 research groups, the independent AI safety ecosystem has long since given up on hoping for a theory to understand deep learning.
The paper is less of a neutral assessment of the evidence and more of a manifesto arguing for a particular theoretical deep learning research agenda. Given the overall sense of doom and gloom, its form makes sense: any less and it might not be enough to shine through the general sense of pessimism to all deep learning theory.
So what’s in the paper?
The authors start by introducing what they believe to be the new emerging theory of deep learning: “learning mechanics” (its name is a deliberate nod to physics theories such as statistical mechanics or quantum mechanics). In the authors’ words, learning mechanics is a theory that concerns itself with ” the dynamics of the training process”, studies them using “coarse aggregate statistics of learning”, and has the goal to generate “accurate average-case predictions”.
(In this sense, this is less a theory of deep learning as a whole and a theory that describes important aspects of deep learning. I’ll return to this later in this piece.)
The authors lay out why such a theory is important. First, there’s the scientific reason: understanding the dynamics may help us better understand the nature of intelligence and the natural world. Second, there’s the practical, engineering reason: a clear characterization of learning dynamics would provide guidance for LLM training. Third, there’s the AI safety reason: understanding the systems better may help with regulation and AI governance, and it’s possible that learning dynamics may contribute to mech interp.
The authors then present five lines of evidence for why learning mechanics both exists, and is likely to become a “theory of deep learning”:
- There exist toy settings that we can analytically solve, that also yield insights that may transfer to large models in practice. Most of these results are from either deep linear networks or linearized versions of neural networks, though recently theoretical progress has been made on toy non-linear neural networks (e.g. 2 layer networks or attention-only models).
- We can take the infinite width or infinite depth limit of neural networks, which sometimes yields interesting insights that can be applied for models in practice (the classic example is mu-parameterization).
- There are clear regularities between aggregate statistics of neural networks: the classic scaling laws that relate parameter count, dataset size, and loss, or various patterns in the weight dynamics, gradient alignment, or basin width over the course of training. While there aren’t many examples of theory allowing us to produce novel predictions of aggregate statistics, the fact that these clear regularities exist, and some theoretical progress has been made in explaining them, is a reason for hope.
- We’ve made progress in terms of understanding and disentangling hyperparameters. Here lies perhaps the main concrete applications of deep learning theory: generating novel rules-of-thumb for scaling learning/initialization hyperparameters as you increase the amount of data or model parameters (again, mu-parameterization is the classic example).
- We’ve found universality in inductive biases, data structure, and representations. That is, it seems that different deep neural networks architectures seem to learn similar representations because many datasets also have similar properties. Again, while the theory is still nascent, the fact that these universals exist is reason for hope.
The authors then spend a small number of words outlining the relationship between learning mechanics and each of: classical learning theory, information theory, physics of deep learning, neuroscience, SLT/dev interp, and empirical science of deep learning. They then spend a much larger number of words outlining the connections between learning mechanics and mechanistic interpretability: learning mechanics may be able to help mech interp by formalizing core assumptions or explain how mechanisms arise during training, while mechanistic interpretability may be able to inspire phenomena to study with learning mechanics (as it has done in the past)
Next, the authors respond to arguments that they anticipate from critics:
- People have tried for decades to develop a theory of deep learning, and they’ve largely failed. The authors correctly point out that the success of deep learning is quite recent (as has the recent research into learning dynamics), and the total amount of effort invested so far is small relative to other scientific disciplines.
- Theory is very far from explaining LLMs. The authors respond that we might still find “local theories” that explain parts of behavior at different scales, and that basic theory may still be useful by providing conceptual handles for analyzing LLMs.
- Models’ high level behavior matters, but low-level theories can’t capture this. The authors analogize this to the relationship between physics (learning mechanics), biology (mech interp) and psychology (behavioral evals). What they imply is that, as understanding physics is useful for biology which is useful for psychology, so too is learning mechanics and mech interp for model evaluations.
- We need a theory of data, not of deep learning. The authors correctly point out that these theories are likely to be complementary.
- The AIs will automate away all human endeavor. The authors note that this is not a unique argument against deep learning theory; all human endeavor is at stake. They argue that theory is already useful, that there will be a transition period with AI-augmented human research, and that understanding learning dynamics may help with oversight of superhuman AIs. (Personally, I find this response the weakest, in large part because I likely disagree with the authors on the usefulness of present work.)
Finally, the authors lay out 10 directions of research in learning dynamics, and provide some tips for research in this area.
The paper is clearly valuable as an overview for anyone getting into interpretability. I think it’s especially useful for people who aren’t familiar with recent academic deep learning theory work. I’d suggest that people who are serious about doing mech interp skim the paper at the very least.
But does does the main claim hold up? Does the paper convince me that that there will be a scientific theory of deep learning?
I think the authors make a stronger case that there will be some theory, than they do for the theory’s usefulness or breadth.
For all the confidence displayed by the paper’s title, I find it ironic that the applications they point to are so weak. The main use of learning mechanics research so far has been in producing new learning mechanics research to retrodict known empirical phenomena; learning dynamics as a field has yielded little practical fruit. The notable exception here is hyperparameter scaling techniques such as mu-parameterization. But even then, it’s possible to derive these techniques either empirically, or heuristically with simple toy models. From talking to deep learning engineers, these theories (at least the theories that belong to academic learning mechanics) have not been useful in practice for LLMs.
I also think it’s worth noting what is not included in learning mechanics. Learning mechanics is far less ambitious than even the moderate versions of rigorous model internals/ambitious mech interp agendas: there is no hope to understand the algorithms learned by any particular network, let alone serve as a rigorous tool for auditing.
Learning mechanics, as the authors note, is intended to be the physics to mech interp’s biology and behavior evaluations’ psychology. But I’d go further than this analogy suggests: learning mechanics is not even trying to be a theory of all of deep learning; while it may be a metaphorical physical theory, it does not endeavor to be a theory of everything. So even if learning dynamics lives up to the authors' hopes, I think it'd still fall short of being a scientific theory of deep learning.
Maybe there will be a scientific theory of deep learning. Maybe learning mechanics will become a theory covering some important aspects of deep learning. Maybe it might even be. But I don’t think the paper has convinced me about these these claims .
For all my criticism, I still really like the piece, and I’m glad the authors wrote it. Too often, believers in fields do not lay out their arguments to be challenged by others; the learning dynamics people have done so with clear language and concrete examples. Insofar as the authors failed to justify their ambitious claim in the title, it’s the result of the titular claim’s ambition as opposed to a lack of effort or evidence on their part.
At the end of the introduction, the authors lay out some hopes in their piece:
We hope the veteran scientist of deep learning will find something valuable in our synthesis of useful approaches and results, and feel galvanized by our depiction of an emerging science. We hope to convince the deep learning practitioner that theory is on a path to fulfilling its longstanding promise of practical utility and to encourage them to experiment with their systems with an eye for science. We hope to convince the AI safety or mechanistic interpretability researcher that white-box theory is difficult yet possible … Lastly, we hope to make it easier for young students and newcomers to the field to get involved.
I doubt this piece will convince many practitioners that deep learning theory is on its path to fulfilling its longstanding utility. I think some AI safety/mech interp researchers may feel heartened by the theory, though I doubt it will change the mind of mech interp skeptics. But even despite these quibbles, I think the authors have done a great service by clearly laying out their hopes and evidence in a way that will be helpful for more junior researchers to understand the academic field of deep learning theory.
Discuss