cs.AI

VCBench: Benchmarking LLMs in Venture Capital

arXiv:2509.14448v2 Announce Type: replace
Abstract: Benchmarks such as SWE-bench and ARC-AGI demonstrate how shared datasets accelerate progress toward artificial general intelligence (AGI). We introduce VCBench, the first benchmark for predicting fou…

cs.AI, cs.HC, cs.LG

Toward Human-AI Complementarity Across Diverse Tasks

arXiv:2605.04070v1 Announce Type: cross
Abstract: Human-AI complementarity, the idea that combining human and AI judgments can outperform either alone, offers a promising pathway toward robust oversight of advanced AI systems. However, whether human-A…

Scroll to Top