I'm brand new to local LLMs and started with GLM-4.7 Flash q4_K_M.
When I run it directly:
ollama run glm-4.7-flash:q4_K_M
it works pretty decently — nothing amazing, but usable and responsive.
The problem starts when I switch to the Claude interface with:
ollama launch claude --model glm-4.7-flash:q4_K_M
Suddenly the model feels super dumb. It has basically zero memory between messages, can't create/save files, and forgets everything from the previous turn.
Concrete example:
- I asked it to “build a CLI Snake game in Python”. It gave me clean, working code.
- Then I said “now create the file in the current folder”. It had **no idea** what Snake game I was talking about and started from scratch like it was a brand new chat.
- i used this prompt(in the pictures) in the first of chat to make it create but it did not create code file even he said it "Files created successfully"
- another thing if i give it super prompt it will like take so much time (+10min) to give me response (response mostly will be stopped random with out full answer ) and maybe do not give me another at all.
i used model (GLM) in continue.div in VS-code and it work fine in chat mode but in agent mode it did not work.
Questions:
- Should I just upgrade to a stronger model? (I have 32 GB RAM + 6 GB VRAM GPU + OS-LINUX-fedora)
- Am I using the model wrong? I thought the “Claude” launcher was the way to get tool use / skills / file creation, but maybe that interface is not meant for this small model?
submitted by