We have developed some relatively general methods for mechanistic estimation competitive with sampling by studying problems that are expressible as expectations of random products. This includes several different estimation problems, such as random halfspace intersections, random #3-SAT and random permanents. In this post, we will give a high-level introduction to these methods before sharing some more detailed notes. This is intended as an interim technical update and will be relatively light on motivation: for a broader discussion of this line of research, see our prior post.
Random instances of the matching sampling principle
All of the problems discussed in this post can be thought of particular choices of "architecture"
Why study random instances of the matching sampling principle? One simple reason is their difficulty: they are often easy enough to be tractable, but hard enough to pose a challenge that expands our inventory of methods. Even more than this, it is essential for us to be able to handle randomly-initialized networks, since they serve as the "base case" upon which our study of trained networks must be built. Somewhat more speculatively, we also have some hope that we will be able to view learned networks as random instances with a more complex architecture, as discussed here. For these reasons, random instances continue to occupy a significant amount of our attention.
Expectations of random products
The particular choices of architecture
Recall that the job of the mechanistic estimator
- By drawing
and from and taking , we obtain the spherical volume of the intersection of random halfspaces passing through the origin, which can also be thought of as the probability that all output neurons of a 1-layer ReLU MLP are active. - By taking
to be a random 3-CNF clause, drawing from and taking , we obtain random #3-SAT, the number of satisfying assignments to a random 3-CNF formula with clauses.[1] - By drawing
from , drawing from the symmetric group and taking , we obtain a random -permanent in the case .
Deduction–projection estimators
In order to produce mechanistic estimates for these expectations of random products, we use deduction–projection estimators. The general idea of a deduction–projection estimator is to split up an expensive computation into multiple steps, and then to alternate between "deduction" steps, which perform one step of the computation exactly, and "projection" steps, which simplify the computational state, preventing an exponential blowup in complexity. For example, our cumulant propagation algorithm for wide random MLPs has this form: in the "deduction" step, we use the activation cumulants at one layer to deduce the activation cumulants at the next layer exactly, and in the "projection" step, we drop terms that are insignificant relative to their computational expense.[2]
In the case of expectations of random products, given
In the deduction step, we multiply our representation of
Importantly, we can be judicious in how we simplify the representation of
Fortunately, we know a lot about what
Mechanistic sketching
In designing the projection step of our deduction–projection estimator, we are left with the following abstract problem: given a function
to be small relative to the
Once the problem is isolated in this way, the solution ends up being a straightforward application of linear algebra. Viewing
and take
Detailed notes
The basic approach we have described, of using a deduction–projection estimator with mechanistic
We hope to describe more of the details of this approach in a future paper, but we may not get around to this for some time. We are therefore sharing some notes that we have been using internally, so that these details are available to those who are interested:
- Deductive estimation for random halfspace intersections (written up by Jacob Hilton): this is focused on the case of random halfspace intersections, but also includes introductory content on deduction–projection estimators and mechanistic
sketching. Our method is the same as the one that we previously summarized here. - Products of symmetric random functions (written up by Wilson Wu): this provides a general framework based on the theory of Gelfand pairs into which our algorithm for random halfspace intersections fits. It then derives a mechanistic estimation algorithm for #3-SAT as a special case of this framework.
- Deduction–projection for the permanent (written up by Wilson Wu): this discusses the case of random
-permanents, which require a kind of symmetry that does not quite fit into the theory of Gelfand pairs. The focus is on -permanents rather than -permanents because it turns out to be trivial to match sampling mechanistically for -permanents, since all of the terms are pairwise uncorrelated. - Worst-case-up-to-signs mechanistic estimations for products of functions (written up by Mike Winer): this provides a general approach based on random walk simulations for Boolean functions with a small number of non-zero Fourier coefficients. We obtain an alternative algorithm for #3-SAT as a special case. More generally, the approach can be applied to both products of linear functions and
-local functions on the Boolean cube, and even allows the non-zero Fourier coefficients to have arbitrary magnitudes, as long as their signs are chosen randomly.
In each case, we roughly match or outperform random sampling up to logarithmic factors, except for the third case, where we are further behind.
Conclusion
We have been studying random instances of the matching sampling principle in order to discover general methods for mechanistic estimation that we can apply more broadly. Mechanistic
Discuss