CUDA: reduce MMQ stream-k overhead by JohannesGaessler · Pull Request #22298 · ggml-org/llama.cpp

By /u/jacek2023 / April 25, 2026

CUDA: reduce MMQ stream-k overhead by JohannesGaessler · Pull Request #22298 · ggml-org/llama.cpp

CUDA prompt processing speedup on MoE

check this https://github.com/ggml-org/llama.cpp/pull/22298#issuecomment-4307164207

submitted by /u/jacek2023
[link] [comments]

Leave a Comment