Transformer Semantic Genetic Programming for d-dimensional Symbolic Regression Problems
arXiv:2511.09416v2 Announce Type: replace
Abstract: Transformer Semantic Genetic Programming (TSGP) is a semantic search approach that uses a pre-trained transformer model as a variation operator to generate offspring programs with high semantic similarity to a given parent. Unlike other semantic GP approaches that rely on fixed syntactic transformations, TSGP aims to learn diverse structural variations that lead to solutions with similar semantics. We find that a single transformer model trained on millions of programs is able to generalize across symbolic regression problems of varying dimension. Evaluated on 24 real-world and synthetic datasets, TSGP significantly outperforms standard GP, SLIM_GSGP, Deep Symbolic Regression, and Denoising Autoencoder GP, achieving an average rank of 1.58 across all benchmarks. Moreover, TSGP produces more compact solutions than SLIM_GSGP, despite its higher accuracy. In addition, the target semantic distance is able to effectively adjust the step size in the semantic space: small values enable consistent improvement in fitness but often lead to larger programs, while larger values promote faster convergence and compactness. Thus, the target semantic distance provides an effective mechanism for balancing exploration and exploitation.