Scaling a Vespa Application: Feeding Fast and Furiously
A tutorial on how to scale the resources in a Vespa application to increase feed throughput. Using the metrics dashboard for informed and optimised scaling.
Retrieval-Augmented Generation (RAG) allows an LLM to answer questions using your data at query time. On their own, LLMs are powerful but limited: they can hallucinate, they have a fixed knowledge cutoff, and they know nothing about your private documents, internal wikis, or proprietary systems.
Tensor-based retrieval preserves context across queries, maintains “chain of thought” and ranking relevance of multiple scientific factors simultaneously.