Inter-Layer Hessian Analysis of Neural Networks with DAG Architectures

arXiv:2604.11639v1 Announce Type: new Abstract: Modern automatic differentiation frameworks (JAX, PyTorch) return the Hessian of the loss function as a monolithic tensor, without exposing the internal structure of inter-layer interactions. This paper presents an analytical formalism that explicitly decomposes the full Hessian into blocks indexed by the DAG of an arbitrary architecture. The canonical decomposition $H = H^{GN} + H^T$ separates the Gauss--Newton component (convex part) from the tensor component (residual curvature responsible for saddle points). For piecewise-linear activations (ReLU), the tensor component of the input Hessian vanishes ($H^{T}_{v,w}\!\equiv\!0$ a.e., $H^f_{v,w}\!=\!H^{GN}_{v,w}\!\succeq\!0$); the full parametric Hessian contains residual terms that do not reduce to the GGN. Building on this decomposition, we introduce diagnostic metrics (inter-layer resonance~$\mathcal{R}$, geometric coupling~$\mathcal{C}$, stable rank~$\mathcal{D}$, GN-Gap) that are estimated stochastically in $O(P)$ time and reveal structural curvature interactions between layers. The theoretical analysis explains exponential decay of resonance in vanilla networks and its preservation under skip connections; empirical validation spans fully connected MLPs (Exp.\,1--5) and convolutional architectures (ResNet-18, ${\sim}11$M~parameters, Exp.\,6). When the architecture reduces to a single node, all definitions collapse to the standard Hessian $\nabla^2_\theta\mathcal{L}(\theta)\in\mathbb{R}^{p\times p}$.

Leave a Comment