MTP vs non-MTP vram usage difference?

As per title, assuming you run both with the same context and quantization in llama.cpp is there any difference in vram usage?

submitted by /u/DeepBlue96
[link] [comments]

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top