Is Min P sampling really the preferred modern alternative to Top K/Top P?

According to what I've been reading (and also according to all models I've asked about this), the consensus seems to be that Min P is the better/more modern approach to sampling and that it should be preferred over Top P/Top K, which should be used only if Min P isn't available or for legacy reasons...

Yet, looking and recently published LLM on huggingface and elsewhere, the recommended parameters for sampling are still largely Top K and/or Top P. Is this only for legacy reasons? Or some other reason?

submitted by /u/bgravato
[link] [comments]

Leave a Comment