Eyon Jang - Provide.ai

Uncategorised

Exploration Hacking: Can LLMs Learn to Resist RL Training?

Eyon Jang / May 1, 2026

We empirically investigate exploration hacking (EH) — where models strategically alter their exploration to resist RL training — by creating model organisms that resist capability elicitation, evaluating countermeasures, and auditing frontier models fo…

Author name: Eyon Jang

Exploration Hacking: Can LLMs Learn to Resist RL Training?