cs.AI

ScoringBench: A Benchmark for Evaluating Tabular Foundation Models with Proper Scoring Rules

arXiv:2603.29928v2 Announce Type: replace
Abstract: Tabular foundation models such as TabPFN and TabICL already produce full predictive distributions, yet prevailing regression benchmarks evaluate them almost exclusively via point-estimate metrics (RM…