Quantum Adaptive Self-Attention for Quantum Transformer Models
arXiv:2504.05336v3 Announce Type: replace-cross
Abstract: Integrating quantum computing into deep learning architectures is a promising but poorly understood endeavor: when does a quantum layer actually help, and how much quantum is enough? We address both questions through Quantum Adaptive Self-Attention (QASA), a hybrid Transformer that replaces the value projection in a \emph{single} encoder layer with a parameterized quantum circuit (PQC), while keeping all other layers classical. This \emph{minimal quantum integration} strategy uses only 36 trainable quantum parameters -- fewer than any competing quantum model -- yet achieves the best MSE on 4 of 9 synthetic benchmarks and a 6.0\% MAE reduction on the real-world ETTh1 dataset. An ablation study reveals that quantum layer \emph{position} matters more than \emph{count}: adding more quantum layers degrades performance, while a single layer at the optimal position consistently outperforms multi-layer quantum configurations. Comparison with two recent quantum time-series baselines -- QLSTM and QnnFormer -- confirms that QASA matches or exceeds models with $2$--$4\times$ more quantum parameters, significantly outperforming QLSTM on the seasonal trend task ($p{=}0.009$, Cohen's $d{>}6$). Crucially, the benefit is \emph{task-conditional}: QASA excels on chaotic, noisy, and trend-dominated signals, while classical Transformers remain superior for clean periodic waveforms -- providing a practical taxonomy for when quantum enhancement is warranted. These findings establish an \emph{architectural parsimony} principle for hybrid quantum-classical design: maximal quantum benefit is achieved not by maximizing quantum resources, but by strategically placing minimal quantum computation where it matters most.