/u/Worried-Variety3397

We’ve resolved the data anonymization challenge, but data extraction is slow. What is your technology stack? [D]

/u/Worried-Variety3397 / April 13, 2026

I am currently building a RAG pipeline that needs to process a massive volume of messy legacy data—including outdated reports, poorly formatted emails, various PDFs, mobile phone photos, and more. While the retrieval and generation components are funct…

Author name: /u/Worried-Variety3397

We’ve resolved the data anonymization challenge, but data extraction is slow. What is your technology stack? [D]