TimeSeriesExamAgent: Creating Time Series Reasoning Benchmarks at Scale
arXiv:2604.10291v1 Announce Type: new
Abstract: Large Language Models (LLMs) have shown promising performance in time series modeling tasks, but do they truly understand time series data? While multiple benchmarks have been proposed to answer this fun…