Author name: LangChain Accounts

Uncategorised

Open Models have crossed a threshold

💡
TL;DR: Open models like GLM-5 and MiniMax M2.7 now match closed frontier models on core agent tasks — file operations, tool use, and instruction following — at a fraction of the cost and latency. Here’s what our evals show and how to start using them
Uncategorised

How we build evals for Deep Agents

💡
TLDR: The best agent evals directly measure an agent behavior we care about. Here’s how we source data, create metrics, and run well-scoped, targeted experiments over time to make agents more accurate and reliable.

Evals shape agent behavior

We’ve been curating evaluations to measure and

Scroll to Top