| I'm using the https://github.com/PrismML-Eng/llama.cpp fork for Bonsai, regular llama.cpp for Gemma. Without embedding parameters: I could've went with a smaller quant of Gemma 4, it's just conventional wisdom to not push small models beyond Q4_K_M. I might try their ternary model later, but I don't have much hope... [link] [comments] |