prompt caching, but for rl training – 7.5x speedup on long-prompt/short-response workloads
most open source RL engines pack sequences naively: prompt + response, repeated for every sample in the group. this is fine for short prompt, long completion workloads but inefficient for long prompt, short completion workloads. with 1000-token p…