MachineLearning

HydraLM: 22× faster decoding and 16× smaller state memory in long-context inference experiments [P]

I’ve been experimenting with HydraLM, a long-context model for inference, and the numbers are getting a bit wild: the repo’s benchmark suite shows 1.00 retrieval accuracy even when the target fact is buried at 90% depth in a 1M-token test, p@1 = 0.987 …