MachineLearning

The Structured Output Benchmark (SOB) – validates both JSON parse and value accuracy [R]

Current structured output benchmarks only validate pass rate for json schema and types, however more commonly the issue tends to be inaccurate json values. For example hallucinated `total_price` number when extracting value from a invoice or an array o…