LocalLLaMA

The option i see online seem to make the model slower

This are the option I'm currently using, setting parallel at 1, using more draft or adding the draft-min-P at 0.75 seem to not be improving, i have a 5090 and I'm running inside docker, now it runs at 100 tok/s and modifying this option it fall…