Algorithmic Simplification of Neural Networks with Mosaic-of-Motifs
arXiv:2602.14896v2 Announce Type: replace
Abstract: Large-scale deep learning models are well-suited for compression. Across a variety of tasks, methods like pruning, quantization, and knowledge distillation have been used to achieve massive reductions in model parameters with only marginal performance drops. This raises the central question: *Why are deep neural networks suited for compression?* In this work, we take up the perspective of algorithmic complexity to explain this behavior. We hypothesize that the parameters of trained models have more structure and, hence, exhibit lower algorithmic complexity compared to the weights at (random) initialization. Furthermore, model compression methods harness this reduced algorithmic complexity to compress models. Although an unconstrained parameterization of model weights, $\mathbf{w} \in \mathbb{R}^n$, can represent arbitrary weight assignments, the solutions found during training exhibit repeatability and structure, making them simpler to implement than a trivial program. To this end, we formalize the Kolmogorov complexity of $\mathbf{w}$ by $\mathcal{K}(\mathbf{w})$. We introduce a constrained parameterization $\widehat{\mathbf{w}}$ that partitions parameters into blocks of size $s$ and restricts each block to be selected from a set of $k$ reusable motifs, specified by a reuse pattern (or mosaic). The resulting method, $\mathit{Mosaic\text{-}of\text{-}Motifs}$ (MoMos), provides a theoretically justified parameterization that biases optimization toward algorithmically simpler solutions. Empirical evidence from multiple experiments shows that MoMos consistently lowers the algorithmic complexity of neural networks during training while preserving the performance of unconstrained models. These results suggest that parameter compressibility is not only observed after training, but can be induced from the optimization domain.