SGOCR: A Spatially-Grounded OCR-focused Pipeline & V1 Dataset [P]
Hello everyone! I've been independently researching & developing small-but-powerful vision-language models (VLMs) and noticed a gap in visual datasets – none were teaching my model to simply ground text in imagery, but trying to get it to reaso…