| Implemented Multi-Token Prediction for LLaMA.cpp. Quantized Gemma 4 assistant models into GGUF format. Ran tests on a MacBook Pro M5Max. Gemma 26B with MTP drafts tokens 40% faster. Prompt: Write a Python program to find the nth Fibonacci number using recursion Outputs: Gemma4-assistant GGUF Quantized models: https://huggingface.co/collections/AtomicChat/gemma-4-assistant-gguf Local AI models app: http://atomic.chat Patched llama.cpp: https://github.com/AtomicBot-ai/atomic-llama-cpp-turboquant [link] [comments] |