"GLM is the most schizophrenic model" Claude

GLM is the most schizophrenic model so far on plan-mode. It both under-clarifies on clear ambiguity (4 audit prompts) AND over-clarifies on degenerate inputs (whitespace, single char) AND over-clarifies on multi-turn answers (reclarify_partial_answer — the user provided answers, model asked again). Both directions failing in the same run suggests GLM doesn't have a stable internal sense of "is this ambiguous or not" — it just has a "should I ask?" coin flip with the bias varying by input shape.

Anyways here are some benchmark on tool calling

Model	plan_mode	plan_mode_stress	tool_calling	file_generation	Combined
qwen/qwen3-coder-next (Q8)	12/13 (92%)	32/38 (84%)	18/20 (90%)	4/6 (67%)	66/77 (86%)
google/gemma-4-26b-a4b	11/13 (85%)	30/38 (79%)	17/20 (85%)	4/6 (67%)	62/77 (81%)
zai-org/glm-4.7-flash	9/13 (69%)	27/38 (71%)	18/20 (90%)	4/6 (67%)	58/77 (75%)
qwen/qwen3-next-instruct-80b (Q6)	12/13 (92%)	28/38 (74%)	15/20 (75%)	2/6 (33%)	57/77 (74%)

submitted by /u/No_Run8812
[link] [comments]

Leave a Comment