Large Language Models as Virtual Survey Respondents: Evaluating Sociodemographic Response Generation

arXiv:2509.06337v2 Announce Type: replace Abstract: Questionnaire-based surveys are foundational to social science research and public policymaking, yet traditional survey methods remain costly, time-consuming, and often limited in scale. Although prior work has explored large language models (LLMs) as virtual survey respondents, existing studies often address narrow task settings, focus on single sociological domains, or lack a unified evaluation framework that enables systematic comparison across diverse datasets and models. To address these gaps, we introduce two complementary task abstractions: Partial Attribute Simulation (PAS), where LLMs predict missing attributes from incomplete respondent profiles, and Full Attribute Simulation (FAS), where LLMs generate complete synthetic datasets under zero-context and context-enhanced conditions. Both are framed as diagnostic and exploratory tools rather than replacements for human data collection. We curate LLM-S^3 (Large Language Model-based Sociodemographic Survey Simulation), a benchmark spanning 11 real-world public datasets across four sociological domains, and evaluate GPT-3.5/4 Turbo and LLaMA 3.0/3.1-8B under zero-shot and few-shot settings. Our evaluation reveals consistent performance trends across model families, highlights failure modes in structured output generation, and demonstrates how context and prompt design affect simulation fidelity. Our code and dataset are available at: https://github.com/dart-lab-research/LLM-S-Cube-Benchmark

Leave a Comment