
A step-by-step guide to refactoring a RAG prototype into a modular, containerized Python application
As a Data Scientist working in a Jupyter Notebook, it may be tempting to create Python scripts for easy and quick outputs. However when it’s ready to productionize your application, it’s best practice to create a Python Package.
Converting this from a simple script to a Python Package gives us several advantages:
- Structure & Organization — A package enforces a clear folder structure
- Reusability — Package modules can be imported anywhere
- Testability — Packages are designed to be imported, which makes unit testing straightforward
- Containerization — Packages have a clean entry point that Docker can call reliably
- Scalability — As your codebase grows, packages scale naturally — you just add modules
🤖This article will not go into the inner workings of RAG, vector databases, or LLMS.
🚨We will discuss some of the principles found in the 12 factor app, linked in the sources below.
🚧Use this guide as a starting point on your journey to productionize!
👉Checkout my github repo below in the sources for the full code!
The Problem
Let’s say you have a Jupyter Notebook and this is what you are starting with, at this point, this script only lives in your Notebook with no way for the users or the outside world to interact with it.
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
from haystack import Pipeline
from haystack.components.retrievers.in_memory import InMemoryBM25Retriever
from haystack.components.builders.prompt_builder import PromptBuilder
from haystack_integrations.components.generators.anthropic import AnthropicChatGenerator
from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack import Document
from haystack.utils import Secret
CLAUDE_API_KEY = ''
document_store = InMemoryDocumentStore()
docs = [Document(content="Lake Como is in Italy"),
Document(content="Lake Como is nice")
]
document_store.write_documents(docs)
retriever = InMemoryBM25Retriever(document_store=document_store)
query = "Where is Lake Como?"
# Define the prompt template
template = """
Given the following information, answer the question.
Context:
{% for document in documents %}
{{ document.content }}
{% endfor %}
Question: {{ query }}?
"""
prompt_builder = PromptBuilder(template=template)
llm = AnthropicChatGenerator(api_key=Secret.from_token(OPENAI_API_KEY))
rag_pipeline = Pipeline()
rag_pipeline.add_component("retriever", retriever)
rag_pipeline.add_component("prompt_builder", prompt_builder)
rag_pipeline.add_component("llm", llm)
rag_pipeline.connect("retriever", "prompt_builder.documents")
rag_pipeline.connect("prompt_builder", "llm")
results = rag_pipeline.run({"retriever": {"query":query},
"prompt_builder": {"query":query}
})
#print(results["llm"]["replies"][0])
formatted_output = {"llm": {"replies": results["llm"]["replies"]}}
print(formatted_output)
This script may be acceptable in in a Jupyter Notebook that will never see the light of day, but if you want to productionize this application then quite a few changes will have to be made.
Looking Ahead
Let’s do a quick run down of what the final output will look like.
We are using the Chocolate Bar ratings dataset, link in the sources below.
Example input query: “What brand makes praline chocolate?”
The final result:
DATE | INFO | rag_pipeline.runner | Received 1 reply(ies) from LLM.
Based on the data provided, **multiple brands make praline chocolate**:
1. **Cadbury** - makes several praline chocolates (P0007, P0023, P0041, P0048, P0071, P0077, P0137, P0188)
2. **Ferrero** - makes several praline chocolates (P0042, P0095, P0103, P0168, P0170, P0177, P0181)
3. **Godiva** - makes several praline chocolates (P0059, P0187)
4. **Hershey** - makes several praline chocolates (P0022, P0068, P0160, P0178)
5. **Lindt** - makes several praline chocolates (P0025, P0030, P0142, P0176)
6. **Mars** - makes several praline chocolates (P0067, P0127, P0134, P0182)
So the answer is: **Cadbury, Ferrero, Godiva, Hershey, Lindt, and Mars** all make praline chocolate products.
So we can see the response from the application.
Let me show you the Docker command on how to run this:
docker run --rm -t
-e PYTHONUNBUFFERED=1 --env-file .env
-v "$(pwd)/products.csv:/app/data.csv"
-v "$(pwd)/chroma_db:/app/chroma_db"
rag-pipeline
--question "What brand makes praline chocolate?"
--file /app/data.csv
Let’s break it down:
--rm deletes the container when it finishes (since ths is for demo purposes only)
-t makes the output readable in the terminal
-e PYTHONUNBUFFERED=1 allows for logs and print statements appear immediately
--env-file .env loads your .env file
-v "$(pwd)/products.csv:/app/data.csv" mounts your local products.csv into the container at /app/data.csv. $(pwd) is your current directory on the host machine. This means the container reads your actual local file without needing to bake the data into the image.
-v "$(pwd)/chroma_db:/app/chroma_db Mounts your local chroma_db folder into the container. This is the persistence piece — ChromaDB reads and writes to your local disk through this mount.
rag-pipeline is the docker image
--question.... is the argument to our main.py file, consumed by argparse
/app/data.csv is the container path, which maps back to the local products.csv file
The Solution
These are the upgrades I will cover:
- Organize python files into a Project Structure
- Configuration & Secrets in a .env file
- Logging and Unit Testing
- Cooperative Multitasking with async for API calls
- Persistent Data Storage with ChromaDB
- Containerization with Docker
The Upgrades
Project Structure
So we are are taking our one cell python script and separating out each function into it’s own file. Structure is how you manage complexity over time. A small project can survive chaos. A production RAG pipeline with multiple collaborators, tests, deployment, and ongoing changes cannot.
You may notice the src/ folder in the new project layout.
👉The src/ layout in particular is a deliberate Python packaging convention — it prevents accidental imports of your local code instead of the installed package during testing, which is a subtle but real bug that bites people in production!
So the task here is to separate out every core unit of work into it’s own Python file. Below is what the final product will look like.
rag_project/
├── .env # Environment variables (API keys, secrets)
├── Dockerfile # Container image definition for deployment
├── config.yaml # Main configuration file (models, paths, settings)
├── conftest.py # Pytest fixtures and shared test configuration
├── products.csv # Source data file containing product records
├── pyproject.toml # Project metadata and build tool configuration
├── requirements.txt # Python package dependencies
├── src/
│ └── rag_pipeline/
│ ├── __init__.py # Package initializer, exposes public API
│ ├── config.py # Loads and validates config.yaml into typed settings
│ ├── documents.py # Handles document loading, chunking, and preprocessing
│ ├── logging_config.py # Configures logging format, levels, and handlers
│ ├── main.py # Entry point; parses args and launches the pipeline
│ ├── pipeline.py # Core RAG logic: embed, retrieve, and generate
│ ├── prompts.py # Prompt templates for LLM queries
│ ├── retriever.py # Vector store interface and similarity search
│ └── runner.py # Orchestrates pipeline execution and error handling
└── tests/
├── __init__.py # Makes tests a package; shared test imports
└── test_config.py # Unit tests for config loading and validation
Configuration & Secrets
The core rule is: anything that changes between environments (dev, staging, production) or that is sensitive should never be hardcoded and instead live in a .env file that never gets commites and only exists on your machine.
Many reasons for not hard coding secrets:
👉You never want your secrets in your Git History
👉If you are sharing your work with a colleague, they could have their own keys that could over write your own
👉Production will have values it’s own keys
dotenv is a common Python Library to read in secrects from a .env file:
from dotenv import load_dotenv
_PROJECT_ROOT = Path(__file__).resolve().parent.parent.parent
load_dotenv(dotenv_path=_PROJECT_ROOT / ".env", override=False)
Unit testing and Logging
Unit Tests
Code that isn’t tested is code you can’t safely change. Without tests:
- You refactor one of your scripts and silently break something three files away
- A collaborator changes a function signature and nothing catches it until production
- You can’t tell if a bug was always there or was introduced by a recent change
Logging
Logging gives visibility into what the program is actually doing. In production you can’t attach a debugger or add print statements. Logging is your only window into what the system is doing when something goes wrong.
Persistent Data Storage
Previously, we were storing our knowledge base in memory, this is very bad! For a small dataset, this may be okay, but as the knowledge base grows it means slow retrieval, expensive and fragile.
👉Chroma DB is an open-source vector database designed specifically to simplify the building of AI applications, particularly those using Large Language Models (LLMs)
The first time you run Chroma DB , it embeds and indexes everything. Every subsequent run just loads the existing index — which is nearly instant.
In this project we are using the Haystack integration with Chroma DB:
from haystack_integrations.document_stores.chroma import ChromaDocumentStore
👉Following the 12 factor principle of backing services — ChromaDB becomes an attached resource your app connects to, not something it rebuilds!
Docker
Docker could be it’s own book chapter, so there is a lot to cover, but for this we will keep it brief.
Docker allows the application to be containerized, meaning packaging software with everything it needs (code, libraries, dependencies) into a portable, isolated unit. It also helps with sharing a project with a colleague, you never have the excuse of “Well, it worked on my machine”
👉Containerization is a software deployment process that bundles an application’s code with all the files and libraries it needs to run on any infrastructure.
Cooperative Multitasking
If this RAG application was in production, it wouldn’t be serving one person (hopefully) — it would be handling many requests at the same time.
Async and Await
Using async defines a function that can pause, and await is the moment it actually does pause.
async def run_query_async(pipeline, query):
...
👉This tells Python: “this function participates in the event loop and is allowed to pause mid-execution.” Calling it doesn’t run it immediately — it returns a coroutine object, which is essentially a paused function waiting to be run.
results = await loop.run_in_executor(...)
👉This says two things simultaneously:
- “start this thing”
- “pause me here until it’s done, and let someone else run in the meantime”
Without await, the event loop never gets a chance to switch to another task. It's the mechanism that makes cooperation actually happen.
Still not convinced?
Without async and await
Request 1: ──[waiting for LLM API 2s]──────────────────► done
Request 2: ──[waiting 2s]──► done
Request 3: ──[waiting 2s]──► done
Total time: 6s
Without async, each request holds up every other request behind it!
With async and await
Request 1: ──[waiting]──────────────────► done
Request 2: ──[waiting]────────────────► done
Request 3: ──[waiting]──────────────► done
Total time: ~2s
You can’t beat these results!
🚨One thing to note: if you are running this application locally, with you just putting in one query at a time you will not see the benefits of async and await. Once it’s in production, with multiple calls at once, is where the real time savings come in.
🚨I highly recommend checking out other Concurrency options that Python has!
We covered so much in a short time but I hope this article helps you on your journey to not just be a Data Scientist, but a Data Scientist and a Python Developer!
sources:
- What is Containerization? - Containerization Explained - AWS
- Python's asyncio: A Hands-On Walkthrough - Real Python
- The Twelve-Factor App
- ChromaDocumentStore | Haystack Documentation
- Chocolate Sales Dataset 2023 - 2024
https://github.com/a-rhodes-vcu/rag_to_production/tree/main/src/rag_pipeline
Beyond the Jupyter Notebook: How to Build a Dockerized RAG Pipeline in Python using Haystack. was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.