Mechanistic Evidence for Spectral Structures in Prior-Data Fitted Networks
arXiv:2601.21731v2 Announce Type: replace
Abstract: Prior-Data Fitted Networks (PFNs) enable amortized Bayesian inference in a single forward pass, yet their internal representations remain opaque. It is unknown whether PFNs encode identifiable Bayesian structure or merely memorize input-output mappings. We provide mechanistic evidence that PFNs learn structured spectral representations and that these can be extracted as explicit kernels. First, probing experiments across three architectures, including the publicly released TabPFN, show that spectral information is linearly decodable from the latent attention score and organized along a dominant principal axis. Activation patching and targeted subspace interventions establish that this information is causally used for prediction and concentrated in a low-dimensional subspace, with spectral directions an order of magnitude more effective than random ones. Crucially, these properties hold on TabPFN with both synthetic out-of-distribution inputs and real-world time series (Airline Passengers, Milk Production), indicating they are emergent features of PFN-style amortization over continuous regression tasks rather than artifacts of training prior. Second, we introduce a Filter Bank Decoder that maps frozen PFN latents to explicit spectral densities, reconstructing stationary kernels via Bochner's theorem. The resulting kernels support GP regression competitive with iterative baselines while requiring only a single forward pass, demonstrating that PFN priors are not merely implicit but are explicitly recoverable as portable Bayesian objects.