KANMixer: a minimal KAN-centered mixer for long-term time series forecasting

arXiv:2508.01575v2 Announce Type: replace Abstract: Long-term time series forecasting (LTSF) underpins critical applications from energy management to weather prediction, yet achieving reliable multi-step-ahead accuracy remains challenging. Existing LTSF approaches, dominated by MLP- and Transformer-based architectures, either rely on simple linear mappings or introduce increasingly complex hand-crafted inductive biases, raising the question of whether a more expressive and principled nonlinear core could offer a better alternative. Therefore, we investigate whether Kolmogorov-Arnold Networks (KANs), a recently proposed model featuring adaptive basis functions capable of granular modulation of nonlinearities, can improve LTSF performance, and under which design choices they are most effective. Specifically, we propose KANMixer, a minimal KAN-centered architecture consisting of a multi-scale pooling frontend, a KAN-based temporal mixing backbone, and prediction heads. By avoiding heavy auxiliary modules, KANMixer enables a clear assessment of KAN components in LTSF. Across 28 benchmark-horizon settings against nine baselines, KANMixer achieves the best MSE in 16 settings and the best MAE in 11. Furthermore, extensive ablations on three representative datasets show that KAN effectiveness depends strongly on the choice of edge function; B-spline bases outperform Fourier and Wavelet alternatives; the prediction head contributes most to the gains; moderate depth is preferred over deeper unstable stacks; and decomposition priors help MLP but harm KAN. Beyond practical guidance for integrating KAN into LTSF, these results reveal an underexplored dependency between structural priors and backbone nonlinearity: design choices that benefit MLP can degrade KAN.

Leave a Comment