baidu/ERNIE-Image · Hugging Face
submitted by /u/adefa [link] [comments]
Hey ! Nathan from huggingface here, i maintained the Open LLM Leaderboard and in that time, I’ve evaluated around 10k model. I think there’s a pretty big misconception in how people benchmark LLMs. Most setups I see rely on inference providers like Ope…
If you don't install SPIR-V headers it will no longer compile, keep that in mind: https://github.com/ggml-org/llama.cpp/pull/21572/changes#diff-43453f510556d352276e897e137cb103b3bbca24acb6cba33208d4887b2e3c77R497 submitted by /u…
Here's what I tested: Prompt: Provide a brief summary of the events in 1989, comparing the results in Europe versus Asia. Response: (a solid overview covering the major events) […] Fall of the Berlin Wall (Nov 9): The defining moment when East Ge…
https://github.com/ggml-org/llama.cpp/pull/21038 Since now cache quantization has better quality, does that mean Q8 cache is a good choice now? For example for 26B Gemma4? submitted by /u/Longjumping_Bee_6825 [link] [comments]
submitted by /u/Anxious_Basil8446 [link] [comments]
This is V2 of my previous post. What's new: –ai-tune — the model starts tuning its own flags in a loop and caches the fastest config it finds. My weird rig: 3090 Ti + 4070 + 3060 + 128GB RAM. Model llama-server llm-server v1 tuning llm-server v2…
I created a small demo to illustrate how agents work compared to a standard chat bot. Afterwards, I played with the simple loop and added 5 tools: grep, glob, read_file, write_file, edit_file and gave it a code editing task to see how it fared w…
Gemma quant comparison on M5 Max MacBook Pro 128GB (subjective of course, but on variety of categories): gemma 4 leaderboard the surprising bit: Gemma 4 31B 4bit scored higher than 8bit. 91.3% vs 88.4%. not sure why: could be the template, could …
If the claims presented in the paper are true, this will be very big for multi-user local inference submitted by /u/Particular-Look-2640 [link] [comments]