LocalLLaMA

The LLM tunes its own llama.cpp flags (+54% tok/s on Qwen3.5-27B)

This is V2 of my previous post. What's new: –ai-tune — the model starts tuning its own flags in a loop and caches the fastest config it finds. My weird rig: 3090 Ti + 4070 + 3060 + 128GB RAM. Model llama-server llm-server v1 tuning llm-server v2…