Jessica M. Lundin, Usman Nasir Nakakana, Guillaume Chabot-Couture

From Guidelines to Guarantees: A Graph-Based Evaluation Harness for Domain-Specific Evaluation of LLMs

Jessica M. Lundin, Usman Nasir Nakakana, Guillaume Chabot-Couture / March 26, 2026

arXiv:2508.20810v2 Announce Type: replace
Abstract: Rigorous evaluation of domain-specific language models requires benchmarks that are comprehensive, contamination-resistant, and maintainable. Static, manually curated datasets do not satisfy these pr…

Author name: Jessica M. Lundin, Usman Nasir Nakakana, Guillaume Chabot-Couture

From Guidelines to Guarantees: A Graph-Based Evaluation Harness for Domain-Specific Evaluation of LLMs