Max Hartman, Vidhata Jayaraman, Moulik Choraria, Lav R. Varshney

Protecting the Trace: A Principled Black-Box Approach Against Distillation Attacks

Max Hartman, Vidhata Jayaraman, Moulik Choraria, Lav R. Varshney / April 28, 2026

arXiv:2604.23238v1 Announce Type: cross
Abstract: Frontier models push the boundaries of what is learnable at extreme computational costs, yet distillation via sampling reasoning traces exposes closed-source frontier models to adversarial third partie…

Author name: Max Hartman, Vidhata Jayaraman, Moulik Choraria, Lav R. Varshney

Protecting the Trace: A Principled Black-Box Approach Against Distillation Attacks