do GLM-4.7 Flash Q4_K_M have problem with claude or agent?

I'm brand new to local LLMs and started with GLM-4.7 Flash q4_K_M.

When I run it directly:

ollama run glm-4.7-flash:q4_K_M

it works pretty decently — nothing amazing, but usable and responsive.

The problem starts when I switch to the Claude interface with:

ollama launch claude --model glm-4.7-flash:q4_K_M

Suddenly the model feels super dumb. It has basically zero memory between messages, can't create/save files, and forgets everything from the previous turn.

Concrete example:

I asked it to “build a CLI Snake game in Python”. It gave me clean, working code.
Then I said “now create the file in the current folder”. It had **no idea** what Snake game I was talking about and started from scratch like it was a brand new chat.
i used this prompt(in the pictures) in the first of chat to make it create but it did not create code file even he said it "Files created successfully"
another thing if i give it super prompt it will like take so much time (+10min) to give me response (response mostly will be stopped random with out full answer ) and maybe do not give me another at all.

i used model (GLM) in continue.div in VS-code and it work fine in chat mode but in agent mode it did not work.

Questions:

Should I just upgrade to a stronger model? (I have 32 GB RAM + 6 GB VRAM GPU + OS-LINUX-fedora)
Am I using the model wrong? I thought the “Claude” launcher was the way to get tool use / skills / file creation, but maybe that interface is not meant for this small model?

submitted by /u/Agent0o6
[link] [comments]

Concrete example:

Questions:

Leave a Comment