Building the QWEN3.6 – Codex Bridge Furthe + Kindergarten Harness Reality Check

I got a bit further with my harness for running Qwen 3.6 model on Codex. While testing, analyzing, and building the harness, I evolved TBG(O)llama-swap into a full forensic UI bridge and LLM analytics tool where every harness finding, modification, correction, tool call, reasoning step, and execution flow is fully visible.

This level of transparency was necessary to identify the behavioral differences between native OpenAI models and Qwen 3.6, and to fine-tune the harness accordingly.

The video shows a full Codex run on Qwen 3.6 running on a single NVIDIA GeForce RTX 5090. (Codex in VS Code -> tbg(o)llama-swap -> llama.cpp with qwen 3.6 27B)

The ongoing work can be checked here https://github.com/Ltamann/tbg-ollama-swap-prompt-optimizer/tree/qwen3.6 ,First post , second post

Here’s the clearest current status.

Working

apply_patch
apply_patch create/update/delete flow
create_file requires non-empty diff or content
update_file requires non-empty diff or content
delete_file works without diff
shell
web_search
web_search using TBG(O)llama-swap built-in web search
file_search
view_image
request_user_input
update_plan
spawn_agent
wait_agent
send_input
resume_agent
close_agent
supports_search_tool catalog inconsistency
agent_send_input_roundtrip
agent_subagent_same_model
shell_patch_verify_sequence
web_research_then_notes
plan_act_switch_impl
multi_web_patch_verify
skill_create_and_use_local
workspace_summary_then_plan
skill_read_local
direct_plan_no_web
web_research_then_plan
file_search_then_patch
view_image_then_report
invalid apply_patch retry exhaustion no longer finalizes with fake progress prose
safer recovery branch after broken apply_patch
false patch-intent/path-hint extraction from instructions
reconnect bug caused by unhealthy or duplicate upstream adoption
long delayed 502 timeout path shortened and improved
native-vs-local contrast harness:
- init
- compare
- per-scenario comparison.json
- top-level comparison_summary.json
- tool-surface diff
- item-type diff
- stream/completion diff
- final visible text diff
- grouped UX-summary diff

Implemented in the Bridge Contract

stricter separation of:
- visible assistant text
- tool call items
- tool outputs
- file/code artifacts
explicit continuation-state handling for:
- research flow
- write-pending flow
- verification flow
- final-answer handoff

Fixed Enough To Work, But Still Not Native-Perfect

grouped searches
grouped tool calls
grouped file changes
collapsible internal history

These areas are significantly improved in both the UI and harness, but I would still describe them as partially aligned, not fully native-identical yet.

Fixed

mcp__playwright__browser_navigate
mcp__playwright__browser_snapshot
mcp__playwright__browser_click
mcp__playwright__browser_evaluate
mcp__playwright__browser_resize
mcp__playwright__browser_take_screenshot

Important nuance:

llama-swap now preserves and exposes these much more accurately
however, the WSL Codex router still rejects Playwright leaf calls as unsupported in this surface
this is now tracked as a known limitation, not an active llama-swap bridge bug

Still Not Fully Closed / Needs More Work

full native-style grouped worker UX parity
some remaining model-quality quirks during long multi-step runs
continuation/reporting polish around malformed reasoning/text splits

submitted by /u/TBG______
[link] [comments]

Leave a Comment