how do you actually catch your agent breaking in prod before users do? [D]
we run an agent thing in production and we use langfuse for traces. last month our agent started refusing requests it should have answered. took us almost a week to notice. evals were all green. traces looked normal because each call by itself was fine…