Liang Chen, Qi Liu, Wenhuan Lin, Feng Liang

Criterion Validity of LLM-as-Judge for Business Outcomes in Conversational Commerce

Liang Chen, Qi Liu, Wenhuan Lin, Feng Liang / April 2, 2026

arXiv:2604.00022v1 Announce Type: new
Abstract: Multi-dimensional rubric-based dialogue evaluation is widely used to assess conversational AI, yet its criterion validity — whether quality scores are associated with the downstream outcomes they are me…

Author name: Liang Chen, Qi Liu, Wenhuan Lin, Feng Liang

Criterion Validity of LLM-as-Judge for Business Outcomes in Conversational Commerce