Scaling a Vespa Application: Feeding Fast and Furiously
A tutorial on how to scale the resources in a Vespa application to increase feed throughput. Using the metrics dashboard for informed and optimised scaling.
Documents are embedded once — worth the spend for maximum quality. Queries hit you on every request. This is what drives your cost at scale. Asymmetric retrieval with Voyage AI and Vespa. Real numbers, real config.
Retrieval-Augmented Generation (RAG) allows an LLM to answer questions using your data at query time. On their own, LLMs are powerful but limited: they can hallucinate, they have a fixed knowledge cutoff, and they know nothing about your private documents, internal wikis, or proprietary systems.
Tensor-based retrieval preserves context across queries, maintains “chain of thought” and ranking relevance of multiple scientific factors simultaneously.