cs.AI, cs.CL, cs.CY

Measuring What Matters — or What’s Convenient?: Robustness of LLM-Based Scoring Systems to Construct-Irrelevant Factors

arXiv:2603.25674v1 Announce Type: new
Abstract: Automated systems have been widely adopted across the educational testing industry for open-response assessment and essay scoring. These systems commonly achieve performance levels comparable to or super…