Exponential Approximation Rates and Parameter Efficiency of Learnable Bernstein Activations

arXiv:2602.04264v2 Announce Type: replace-cross Abstract: The choice of activation function fundamentally shapes the representational capacity and parameter efficiency of deep neural networks, yet most widely used activations lack rigorous theoretical guarantees on these properties. We provide a theoretical analysis of DeepBern-Nets (DBNs) -- networks employing learnable Bernstein polynomial activations -- showing that their approximation error decays with the network depth $L$ and the polynomial order $n$ with a rate of $\mathcal{O}(n^{-L})$, exponentially faster than the polynomial rate of ReLU architectures while remaining fully differentiable. We validate these predictions through $1{,}344$ experiments on large scientific datasets (HIGGS and SUSY), comparing DBNs against ReLU, Leaky ReLU, SELU, and GeLU. DBNs achieve over $70\%$ parameter reduction across the majority of architectures -- reaching $99.9\%$ at scale -- converge to ReLU's final loss in as few as $26\%$ of the training epochs, and attain up to $45\%$ lower final loss. These advantages hold over all tested activations, confirming that DBN's gains stem from the learnable polynomial structure rather than mere smoothness.

Leave a Comment