In my line of work, PDF documents tend to be combinations of text, math formulas, tables and images.
llama.cpp added support for PDFs a few months ago, but I believe it treats PDFs either as text (discarding everything else), or as images. This seems suboptimal, since PDFs are basically multi-modal.
On the other hand, Gemma-4 lists PDF processing/parsing as one of its core features. How do I use that? Should I be using llama.cpp, llama-cpp-python, transformers or something else?
[link] [comments]