If you collect the ReLU decisions into a diagonal matrix with 0 or 1 entries then a ReLU layer is DWx, where W is the weight matrix and x the input.
What then is Wₙ₊₁Dₙ where Wₙ₊₁ is the matrix of weights for the next layer?
It can be seen as a (locality sensitive) hash table lookup of a linear mapping (effective matrix). It can also be seen as an associative memory in itself with Dₙ as the key.
There is a discussion here:
https://discourse.numenta.org/t/gated-linear-associative-memory/12300
The viewpoints are not fully integrated yet and there are notation problems.
Nevertheless the concepts are very simple and you could hope that people can follow along without difficulty, despite the arguments being in such a preliminary state.
[link] [comments]