An Empirical Study of Validating Synthetic Data for Text-Based Person Retrieval
arXiv:2503.22171v2 Announce Type: replace
Abstract: Data plays a pivotal role in Text-Based Person Retrieval (TBPR) research. Mainstream research paradigm necessitates real-world person images with manual textual annotations for training models, posing privacy concerns and annotation burdens. Several pioneering efforts explore synthetic data generation, and yet still depend on real data as a foundation, inheriting the same limitations. The feasibility of purely synthetic TBPR data remains unexplored, and there is currently no systematic study on the effectiveness boundaries of synthetic data across various real-world scenarios. In this work, we present the first comprehensive empirical study of synthetic data for TBPR, with two key aspects. (1) We propose a unified data synthesis pipeline that can operate entirely without real person data. It combines an inter-class image generation module that produces diverse identity-centric images by means of an automatic prompt construction strategy, and an intra-class augmentation module that enhances identity variation through text-driven image editing. (2) Leveraging this pipeline and an automatic textual description generation, we explore the effectiveness of synthetic data in diverse scenarios through extensive experiments, to reveal its practical utility as either a standalone replacement or a complementary augmentation to real data.