CausalReasoningBenchmark: A Real-World Benchmark for Disentangled Evaluation of Causal Identification and Estimation
arXiv:2602.20571v2 Announce Type: replace
Abstract: Many benchmarks for automated causal inference evaluate a system’s performance based on a single numerical output, such as an Average Treatment Effect (ATE). This approach conflates two distinct step…