Author name: Aurelien Bibaut, Houssam Zenati, Thibaud Rahier, Nathan Kallus

Functional Natural Policy Gradients

Aurelien Bibaut, Houssam Zenati, Thibaud Rahier, Nathan Kallus / April 6, 2026

arXiv:2603.28681v2 Announce Type: replace-cross
Abstract: We propose a cross-fitted debiasing device for policy learning from offline data. A key consequence of the resulting learning principle is $\sqrt N$ regret even for policy classes with complexi…

cs.LG, stat.ML

Functional Natural Policy Gradients

Aurelien Bibaut, Houssam Zenati, Thibaud Rahier, Nathan Kallus / March 31, 2026

arXiv:2603.28681v1 Announce Type: cross
Abstract: We propose a cross-fitted debiasing device for policy learning from offline data. A key consequence of the resulting learning principle is $\sqrt N$ regret even for policy classes with complexity great…