| I've been running PaddleOCR-VL-1.5 via llama.cpp's server for OCR on book pages. It handles complex layouts, tables, and mixed text/figure pages surprisingly well. Setup: The pipeline can process an entire folder of page photos end-to-end. You can basically digitalise a book with a single command. Repo: https://github.com/akmalayari/ocr-book Has anyone else experimented with vision-language models for OCR? [link] [comments] |