LocalLLaMA

Qwen will release another 27B with high probability

/u/serige / May 20, 2026

They are waiting for the exact roadmap submitted by /u/serige [link] [comments]

I got Qwen3-VL-Embedding-2B working with rkllm on an Orange Pi 5b

/u/atineiatte / May 20, 2026

This shit is cool, I have a demo script where it compares over 1,300 phrases for similarity to a live webcam image, and it can process one image every 10 seconds or so. I've been waiting fruitlessly for someone to get the model working on thi…

LocalLLaMA

LLC: lightweight OpenWebUI alt – now with chat converter + custom tool calls

/u/PromptInjection_ / May 20, 2026

Posted my project here a while back and got some solid feedback via DMs. The main ask was a converter so people don't lose their existing chats when switching – that's in now. https://preview.redd.it/mfn5i99d6c2h1.png?width=1400&forma…

LocalLLaMA

24GB M4 Mac – is Qwen 9B only option while system is running?

/u/sagiroth / May 20, 2026

I have mac at work that I want to use local model for prototyping and basic prompts that needs to stay on device. What sort of model I can run that I can fit at least 64k context ? Any setups sbare or guides welcome. I need to have firefox open with on…

LocalLLaMA

At wits end for optimizing settings in llama.cpp for 100k context

/u/scarlettwidow2024 / May 20, 2026

Long story short, I am running Qwen3.5-35B-A3B (GGUF format) and other models on MacOS and getting around 1500 tokens/sec for prompt processing and around 35-50 tokens per second for prompt processing. I'm using the latest version of llama.cpp on M…

LocalLLaMA

Move to backend sampling for MTP draft path by gaugarg-nv · Pull Request #23287 · ggml-org/llama.cpp

/u/jacek2023 / May 20, 2026

improved MTP performance submitted by /u/jacek2023 [link] [comments]

LocalLLaMA

AI server under 5k?

/u/Last_Bad_2687 / May 20, 2026

I have a framework desktop 128GB and a 3080 12GB running qwen 7b I want to move to a proper server rack + switch but not sure how to move from desktop PC to server rack. Any advice on what GPU/Server to get under 5k? Or at that price just stick to work…

LocalLLaMA

I guess 4 units wasn’t enough.

/u/Simple_Library_2700 / May 20, 2026

I don’t think this thing is going to work out, if anyone wants a 4u gpu server complete with half a terabyte of ram hit me up. (/s) submitted by /u/Simple_Library_2700 [link] [comments]

LocalLLaMA

Waiting on Qwen to drop those 3.7 models be like:

/u/Porespellar / May 20, 2026

Mods please be kind. This was not “low effort”. It took me several minutes to find just the right waiting room gif to capture the sentiment of all us folks patiently waiting for our brothers and sisters in the east to hopefully drop some amazing …

LocalLLaMA

Qwen 3.6 35B GGUF: NTP vs MTP quantization results across GPUs and CPUs

/u/enrique-byteshape / May 20, 2026

Hey r/LocalLLaMA, We’ve released our ByteShape Qwen 3.6 35B GGUF quantizations in two families: standard NTP (Next Token Prediction or non-MTP) and MTP. Blog / Download NTP Models / Download MTP Models TL;DR For NTP, “pick the largest quant that…