harness-engineering

agent evaluation, agent observability, agent workflows, AI Agents, AI Engineering, AI Infrastructure, Arize AI, developer-tools, harness-engineering, LLM Evals, llm-applications, model drift, model-evaluation, observability

What we learned testing 7 models under the same agent harness

Model swaps look like configuration changes, but they behave more like product migrations. A new model may be cheaper, faster, easier to get capacity for, or stronger on public benchmarks….

The post What we learned testing 7 models under the same agent harness appeared first on Arize AI.

agent observability, AI agent harness, AI Agents, Arize AX, claude-code, Codex, coding-agents, cursor, Evals, gemini-cli, github-copilot, harness tracing, harness-engineering, LLM observability, MCP, Open Source, OpenTelemetry, phoenix

Coding agent tracing and evaluation: An open source tool to improve AI coding workflows

Announcing coding harness tracing for observing, evaluating, and improving coding agent workflows across Claude Code, Cursor, Codex, GitHub Copilot, and Gemini CLI.

The post Coding agent tracing and evaluation: An open source tool to improve AI coding workflows appeared first on Arize AI.

Scroll to Top