AI Engineering

agent evaluation, agent observability, agent workflows, AI Agents, AI Engineering, AI Infrastructure, Arize AI, developer-tools, harness-engineering, LLM Evals, llm-applications, model drift, model-evaluation, observability

What we learned testing 7 models under the same agent harness

Model swaps look like configuration changes, but they behave more like product migrations. A new model may be cheaper, faster, easier to get capacity for, or stronger on public benchmarks….

The post What we learned testing 7 models under the same agent harness appeared first on Arize AI.

agent observability, Agent tracing, agent workflows, agent-memory, AI Agents, AI debugging, AI Engineering, AI Infrastructure, Arize AI, autonomous agents, context graphs, developer-tools, graph databases, llm-applications, Machine Learning, observability, Phoenix OSS, RAG, reasoning systems, retrieval augmented generation, Self-improving agent

Building a self-improving agent on a context graph of human disagreement

You can build a measurably better agent from data you already have, without retraining a thing. The data is what your experienced humans do when they correct the AI. Capture…

The post Building a self-improving agent on a context graph of human disagreement appeared first on Arize AI.

agent observability, agent traces, Agents, AI Agents, AI Engineering, Alyx, dogfooding, Evals, LLM agents, LLM observability, trace debugging

How we use Alyx to build Alyx: How to build an AI agent feedback loop

How Arize uses Alyx to debug Alyx: searching dense traces, aggregating failures, triaging dogfooding issues, and closing the AI engineering feedback loop.

The post How we use Alyx to build Alyx: How to build an AI agent feedback loop appeared first on Arize AI.

Scroll to Top