"GLM is the most schizophrenic model" Claude

GLM is the most schizophrenic model so far on plan-mode. It both under-clarifies on clear ambiguity (4 audit prompts) AND over-clarifies on degenerate inputs (whitespace, single char) AND over-clarifies on multi-turn answers (reclarify_partial_answer — the user provided answers, model asked again). Both directions failing in the same run suggests GLM doesn't have a stable internal sense of "is this ambiguous or not" — it just has a "should I ask?" coin flip with the bias varying by input shape.

Anyways here are some benchmark on tool calling

Model plan_mode plan_mode_stress tool_calling file_generation Combined
qwen/qwen3-coder-next (Q8) 12/13 (92%) 32/38 (84%) 18/20 (90%) 4/6 (67%) 66/77 (86%)
google/gemma-4-26b-a4b 11/13 (85%) 30/38 (79%) 17/20 (85%) 4/6 (67%) 62/77 (81%)
zai-org/glm-4.7-flash 9/13 (69%) 27/38 (71%) 18/20 (90%) 4/6 (67%) 58/77 (75%)
qwen/qwen3-next-instruct-80b (Q6) 12/13 (92%) 28/38 (74%) 15/20 (75%) 2/6 (33%) 57/77 (74%)
submitted by /u/No_Run8812
[link] [comments]

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top