Your RAG Agent Forgets Everything After One Message – Here’s How I Fixed It with Databricks…

Your RAG Agent Forgets Everything After One Message - Here’s How I Fixed It with Databricks Lakebase

Building a context-aware RAG system end-to-end: from PDF parsing to multi-turn conversations that actually remember

Most RAG tutorials show you how to build a system that answers questions from documents. You parse some PDFs, chunk them, embed them, retrieve relevant context, and feed it to an LLM. Works great — until the user asks a follow-up question.

“How does it handle overheating?”

It? What’s “it”? Your agent has no idea, because the previous turn where the user asked about Orion’s motion controller was stored in InMemorySaver() — which is gone the moment your Model Serving endpoint processes the next request.

This is the gap between a demo and a production RAG system. In this article, I’ll walk through the complete build — from parsing multimodal PDFs to deploying a context-aware agent that maintains conversation history across turns using Databricks Lakebase (Postgres).

The Architecture

Here’s what we’re building:

PDFs (text + images + diagrams)
↓ ai_parse_document() (Version 2.0)
Parsed elements (text, tables, figure descriptions)
↓ RecursiveCharacterTextSplitter
Chunks table (Delta, with Change Data Feed)
↓ Delta Sync + GTE-Large
Vector Search Index
↓ VectorSearchRetrieverTool
LangChain Agent + PostgresSaver
↓ ↓
LLM (Foundation Model) Lakebase (conversation memory)

Model Serving Endpoint (MLflow ResponsesAgent)

Key components:

  • Document parsing: ai_parse_document() (Version 2.0) — Databricks' built-in multimodal parser
  • Embeddings: databricks-gte-large-en — 1024-dim, 8192-token context window
  • Vector store: Databricks Vector Search (Delta Sync, managed embeddings)
  • Conversation memory: Lakebase Autoscaling (managed Postgres 17)
  • Agent framework: LangChain + LangGraph + MLflow ResponsesAgent
  • Deployment: Model Serving via agents.deploy()

Let’s build it step by step.

Step 1: Parse Documents with ai_parse_document()

Our source documents are PDFs stored in a Unity Catalog Volume — they contain text, tables, images, and architecture diagrams. The ai_parse_document() function (version 2.0) handles all of this in a single call. It extracts text, renders tables as HTML, and generates AI descriptions for figures.

from pyspark.sql.functions import expr

# Volume path where PDFs are stored
docs_path = "/Volumes/<YOUR_CATALOG>/<YOUR_SCHEMA>/source_docs/"
# Read all files as binary
docs_df = spark.read.format("binaryFile").load(docs_path)
# Parse each document using ai_parse_document v2.0
parsed_df = docs_df.withColumn(
"parsed_content",
expr(f"""ai_parse_document(content, map(
"version", "2.0",
"imageOutputPath", "{docs_path}/parsed_images/"
))""")
)
# Drop binary content (too large to display)
parsed_df = parsed_df.drop("content")
# Save to Delta table
output_table = "<YOUR_CATALOG>.<YOUR_SCHEMA>.docs_parsed"
parsed_df.write.format("delta").mode("overwrite").saveAsTable(output_table)
print(f"✅ Parsed results saved to: {output_table}")

The output is a VARIANT column with structured elements — each element has a type (text, table, figure, section_header, etc.), content, and optionally a description (for figures). This is significantly better than PyPDF2 or Unstructured for complex enterprise PDFs with mixed content.

Step 2: Clean, Transform, and Chunk

The parsed JSON needs to be converted into clean text and then chunked for retrieval. I used two approaches:

Fast Plain Text Extraction

Concatenate all text elements into a single string with == page == tokens separating pages:

from pyspark.sql import functions as F

# Convert VARIANT to JSON string, then extract text content
safe_json_col = F.coalesce(
F.to_json(F.col("parsed_content")),
F.col("parsed_content").cast("string")
)
plain_text_df = parsed_df.withColumn(
"plain_text",
extract_contents_udf()(safe_json_col) # Custom UDF to join text elements
)

Chunk with LangChain

from langchain_text_splitters import RecursiveCharacterTextSplitter
from pyspark.sql.types import StructType, StructField, StringType
import pandas as pd

CHUNK_SIZE = 2000
CHUNK_OVERLAP = 200
splitter = RecursiveCharacterTextSplitter(
chunk_size=CHUNK_SIZE,
chunk_overlap=CHUNK_OVERLAP,
separators=["\n== page ==\n", "== page ==", "\n\n", "\n", " ", ""]
)
schema = StructType([
StructField("path", StringType(), True),
StructField("chunk", StringType(), True),
])
def split_rows(iterator):
for pdf in iterator:
out = []
for _, row in pdf.iterrows():
path, text = row["path"], row["plain_text"]
if isinstance(text, str) and text.strip():
for c in splitter.split_text(text):
if c and c.strip():
out.append((path, c))
yield pd.DataFrame(out, columns=["path", "chunk"])
df_chunks = (
plain_text_df.select("path", "plain_text")
.mapInPandas(split_rows, schema=schema)
)
# Add unique IDs and save
df_chunks = df_chunks.withColumn("id", F.monotonically_increasing_id())
chunked_table = "<YOUR_CATALOG>.<YOUR_SCHEMA>.docs_chunked"
df_chunks.write.format("delta") \
.mode("overwrite") \
.option("mergeSchema", "true") \
.saveAsTable(chunked_table)

Step 3: Build Vector Search

Enable Change Data Feed

Vector Search Delta Sync requires CDF enabled on the source table:

ALTER TABLE <YOUR_CATALOG>.<YOUR_SCHEMA>.docs_chunked
SET TBLPROPERTIES (delta.enableChangeDataFeed = true);

Create the Delta Sync Index

from databricks.vector_search.client import VectorSearchClient

vsc = VectorSearchClient(disable_notice=True)
index_name = "<YOUR_CATALOG>.<YOUR_SCHEMA>.docs_chunked_index"
vsc.create_delta_sync_index_and_wait(
endpoint_name="<YOUR_VS_ENDPOINT>",
index_name=index_name,
source_table_name="<YOUR_CATALOG>.<YOUR_SCHEMA>.docs_chunked",
primary_key="id",
embedding_source_column="chunk",
embedding_model_endpoint_name="databricks-gte-large-en",
pipeline_type="TRIGGERED",
)

With managed embeddings, Databricks automatically handles embedding computation during both indexing (reads chunk, sends to GTE-Large, stores the 1024-dim vectors) and query time (auto-embeds your search string). You never call the embedding model directly.

Test Retrieval

index = vsc.get_index(index_name=index_name)

results = index.similarity_search(
query_text="How does the system prevent overheating?",
columns=["path", "chunk"],
num_results=5,
)
display(results)

Step 4: Set Up Lakebase for Conversation Memory

This is where most RAG tutorials stop. They plug in InMemorySaver() and call it a day. But InMemorySaver means:

  • State lost between requests — every Model Serving invocation starts fresh
  • No multi-turn conversations — the agent can’t resolve “it”, “that”, “the one you mentioned”
  • No session persistence — kernel restart = amnesia

Lakebase (Databricks’ managed Postgres 17, GA on AWS since early 2026) solves this. We’ll use langgraph-checkpoint-postgres to swap InMemorySaver with a Postgres-backed checkpointer.

Provision a Lakebase Autoscaling Project

  1. Click AppsLakebase Postgres in your workspace
  2. Select AutoscalingNew project
  3. Name it (e.g., my-agent-memory), select Postgres 17
  4. Takes about 1 minute to provision

Get Connection Details Programmatically

from databricks.sdk import WorkspaceClient

w = WorkspaceClient()
project_id = "<YOUR_PROJECT_NAME>"
# Get branch and endpoint
branches = list(w.postgres.list_branches(parent=f"projects/{project_id}"))
branch_name = branches[0].name
endpoints = list(w.postgres.list_endpoints(parent=branch_name))
ep = endpoints[0]
HOST = ep.status.hosts.host
ENDPOINT = ep.name
USERNAME = "<YOUR_DATABRICKS_EMAIL>"
print(f"Host: {HOST}")
print(f"Endpoint: {ENDPOINT}")
Important: Lakebase Autoscaling uses w.postgres.* API, not w.database.* (which is for legacy Provisioned instances). The docs still mix these — don't get tripped up.

Test the Connection

import psycopg2

cred = w.postgres.generate_database_credential(endpoint=ENDPOINT)
conn = psycopg2.connect(
host=HOST,
dbname="databricks_postgres",
user=USERNAME,
password=cred.token,
port=5432,
sslmode="require"
)
with conn.cursor() as cur:
cur.execute("SELECT version()")
print(cur.fetchone()[0])
conn.close()
print("✅ Connected to Lakebase!")

Create Checkpoint Tables

from urllib.parse import quote
from langgraph.checkpoint.postgres import PostgresSaver

cred = w.postgres.generate_database_credential(endpoint=ENDPOINT)
DB_URI = (
f"postgresql://{quote(USERNAME, safe='')}:{quote(cred.token, safe='')}"
f"@{HOST}:5432/databricks_postgres"
f"?sslmode=require"
)
with PostgresSaver.from_conn_string(DB_URI) as checkpointer:
checkpointer.setup()
print("✅ Checkpoint tables created!")
Gotcha: If your Databricks username contains @ (e.g., user@company.com), you must use urllib.parse.quote() to URL-encode it. Otherwise the @ is parsed as the URI separator between credentials and host, and the connection fails silently.

This creates four tables in Lakebase: checkpoints, checkpoint_writes, checkpoint_blobs, and checkpoint_migrations.

Step 5: Build the Context-Aware Agent

Now the critical swap — replace InMemorySaver with a Postgres-backed checkpointer in the agent.

Interactive Agent (Notebook)

from urllib.parse import quote
from langchain.agents import create_agent
from databricks_langchain import ChatDatabricks, VectorSearchRetrieverTool
from langgraph.checkpoint.postgres import PostgresSaver
from databricks.sdk import WorkspaceClient
import psycopg
from psycopg.rows import dict_row

def get_lakebase_checkpointer(host: str, endpoint: str, username: str):
"""Create a PostgresSaver backed by Lakebase Autoscaling."""
w = WorkspaceClient()
cred = w.postgres.generate_database_credential(endpoint=endpoint)
db_uri = (
f"postgresql://{quote(username, safe='')}:{quote(cred.token, safe='')}"
f"@{host}:5432/databricks_postgres"
f"?sslmode=require"
)
# IMPORTANT: Use psycopg.connect directly, not from_conn_string
# from_conn_string returns a context manager, not a persistent instance
conn = psycopg.connect(db_uri, autocommit=True, row_factory=dict_row)
checkpointer = PostgresSaver(conn=conn)
checkpointer.setup()
return checkpointer

def build_agent(llm_endpoint: str, index_name: str, num_results: int = 3):
model = ChatDatabricks(endpoint=llm_endpoint, max_tokens=500)
vs_tool = VectorSearchRetrieverTool(
name="knowledge_search",
index_name=index_name,
description="Search knowledge base for relevant information",
num_results=num_results,
)
# Lakebase-backed checkpointer instead of InMemorySaver
checkpointer = get_lakebase_checkpointer(HOST, ENDPOINT, USERNAME)
system_prompt = """You are a Knowledge Assistant. Respond in a clear,
professional tone. Use only verified information from the provided documents.
If the answer cannot be found, clearly state that."""
return create_agent(
model=model,
tools=[vs_tool],
system_prompt=system_prompt,
checkpointer=checkpointer,
)

Test Multi-Turn Conversation

agent = build_agent("<YOUR_LLM_ENDPOINT>", "<YOUR_INDEX_NAME>", 3)

# STABLE thread_id - this is what enables context awareness
config = {"configurable": {"thread_id": "demo-session-001"}}
# Turn 1
r1 = agent.invoke(
{"messages": [{"role": "user", "content": "What is the Orion system?"}]},
config=config
)
print("Turn 1:", r1['messages'][-1].content)
# Turn 2 - agent should know "it" = Orion
r2 = agent.invoke(
{"messages": [{"role": "user", "content": "How does it handle overheating?"}]},
config=config
)
print("Turn 2:", r2['messages'][-1].content)

The magic is in the stable thread_id. With InMemorySaver, each Model Serving request gets a new UUID — no memory. With PostgresSaver and a consistent thread_id, the agent retrieves the full conversation history from Lakebase before generating a response.

Step 6: Production Agent Code (agent.py)

For deployment, the agent code lives in a standalone Python file using MLflow’s “agent as code” pattern:

# agent.py
import os
from uuid import uuid4
from typing import Any, Dict, List
from urllib.parse import quote

import yaml
import mlflow
import psycopg
from psycopg.rows import dict_row
from mlflow.pyfunc import ResponsesAgent
from mlflow.types.responses import ResponsesAgentRequest, ResponsesAgentResponse
from langchain.agents import create_agent
from databricks_langchain import ChatDatabricks, VectorSearchRetrieverTool
from langgraph.checkpoint.postgres import PostgresSaver
from databricks.sdk import WorkspaceClient

def _load_config(path: str = "agent-config.yaml") -> Dict[str, Any]:
if not os.path.exists(path):
raise FileNotFoundError(f"Config file not found at '{path}'")
with open(path, "r", encoding="utf-8") as f:
cfg = yaml.safe_load(f) or {}
llm_endpoint = cfg.get("llm_endpoint_name")
vs = cfg.get("vector_search", {}) or {}
index_name = vs.get("index_name")
num_results = int(vs.get("num_results", 3))
lakebase = cfg.get("lakebase", {}) or {}
return {
"llm_endpoint_name": llm_endpoint,
"vs_index_name": index_name,
"vs_num_results": num_results,
"lakebase_host": lakebase.get("host"),
"lakebase_endpoint": lakebase.get("endpoint"),
"lakebase_user": lakebase.get("user"),
}

def get_lakebase_checkpointer(host, endpoint, user):
w = WorkspaceClient()
cred = w.postgres.generate_database_credential(endpoint=endpoint)
db_uri = (
f"postgresql://{quote(user, safe='')}:{quote(cred.token, safe='')}"
f"@{host}:5432/databricks_postgres?sslmode=require"
)
conn = psycopg.connect(db_uri, autocommit=True, row_factory=dict_row)
checkpointer = PostgresSaver(conn=conn)
checkpointer.setup()
return checkpointer

def build_agent(llm_endpoint, index_name, num_results,
lakebase_host, lakebase_endpoint, lakebase_user):
model = ChatDatabricks(endpoint=llm_endpoint, max_tokens=500)
vs_tool = VectorSearchRetrieverTool(
name="knowledge_search",
index_name=index_name,
description="Search knowledge base for relevant information",
num_results=num_results,
)
checkpointer = get_lakebase_checkpointer(
lakebase_host, lakebase_endpoint, lakebase_user
)
system_prompt = (
"You are a Knowledge Assistant. Respond in a clear, professional tone. "
"Use only verified information from the provided documents. "
"If the answer cannot be found, clearly state that."
)
return create_agent(
model=model, tools=[vs_tool],
system_prompt=system_prompt, checkpointer=checkpointer,
)

def _last_user_text(messages):
user_msgs = [m for m in messages if m.get("role") == "user"]
return str(user_msgs[-1].get("content", "")) if user_msgs else ""

class LangChainResponsesAgent(ResponsesAgent):
def __init__(self):
cfg = _load_config()
self._agent = build_agent(
cfg["llm_endpoint_name"], cfg["vs_index_name"],
cfg["vs_num_results"], cfg["lakebase_host"],
cfg["lakebase_endpoint"], cfg["lakebase_user"],
)
def predict(self, request: ResponsesAgentRequest) -> ResponsesAgentResponse:
msgs = [m.model_dump() for m in request.input]
custom_inputs = dict(request.custom_inputs or {})
thread_id = custom_inputs.get("thread_id", f"session-{uuid4()}")
result = self._agent.invoke(
{"messages": msgs},
config={"configurable": {"thread_id": thread_id}},
)
try:
text = result["messages"][-1].content
except Exception:
text = str(result)
return ResponsesAgentResponse(
output=[self.create_text_output_item(text, str(uuid4()))],
custom_outputs={"thread_id": thread_id},
)

AGENT = LangChainResponsesAgent()
mlflow.models.set_model(AGENT)

Configuration (agent-config.yaml)

llm_endpoint_name: <YOUR_LLM_ENDPOINT>
vector_search:
index_name: <YOUR_CATALOG>.<YOUR_SCHEMA>.docs_chunked_index
num_results: 3
lakebase:
host: <YOUR_LAKEBASE_HOST>
endpoint: projects/<YOUR_PROJECT>/branches/production/endpoints/primary
user: <YOUR_DATABRICKS_EMAIL>

Step 7: Log, Register, and Deploy

Log to MLflow

import mlflow
from importlib.metadata import version as get_version
from mlflow.models.resources import DatabricksVectorSearchIndex, DatabricksServingEndpoint

resources = [
DatabricksVectorSearchIndex(index_name="<YOUR_CATALOG>.<YOUR_SCHEMA>.docs_chunked_index"),
DatabricksServingEndpoint(endpoint_name="<YOUR_LLM_ENDPOINT>"),
]
with mlflow.start_run():
mlflow.set_tags({
"model_type": "retrieval_agent",
"framework": "langchain",
"memory": "lakebase_autoscaling",
})
logged_agent_info = mlflow.pyfunc.log_model(
name="knowledge_assistant",
python_model="agent.py",
code_paths=["agent-config.yaml"],
input_example={"input": [{"role": "user", "content": "What is Orion?"}]},
pip_requirements=[
f"databricks-vectorsearch=={get_version('databricks-vectorsearch')}",
f"databricks-langchain=={get_version('databricks-langchain')}",
f"langchain=={get_version('langchain')}",
f"mlflow=={get_version('mlflow')}",
"langgraph-checkpoint-postgres",
"psycopg[binary]",
"databricks-sdk>=0.89.0",
],
resources=resources,
)
model_uri = logged_agent_info.model_uri

Register to Unity Catalog

mlflow.set_registry_uri("databricks-uc")
UC_MODEL_NAME = "<YOUR_CATALOG>.<YOUR_SCHEMA>.knowledge_assistant"

uc_info = mlflow.register_model(model_uri=model_uri, name=UC_MODEL_NAME)
print(f"✅ Registered: {UC_MODEL_NAME} v{uc_info.version}")

Deploy

from databricks import agents

deployment = agents.deploy(
model_name=UC_MODEL_NAME,
model_version=uc_info.version,
scale_to_zero_enabled=True,
)
print(f"✅ Endpoint: {deployment.query_endpoint}")

The Result

With the agent deployed, multi-turn conversations work exactly as you’d expect:

ws = WorkspaceClient()
client = ws.serving_endpoints.get_open_ai_client()

session = "user-session-042"
# Turn 1
r1 = client.responses.create(
model="knowledge_assistant",
input=[{"role": "user", "content": "What is the Orion motion controller?"}],
extra_body={"custom_inputs": {"thread_id": session}}
)
# Turn 2 - "it" resolves correctly to Orion
r2 = client.responses.create(
model="knowledge_assistant",
input=[{"role": "user", "content": "How does it prevent overheating?"}],
extra_body={"custom_inputs": {"thread_id": session}}
)

And in Lakebase, you can see the checkpoint data accumulating across conversation threads — each thread preserving the full message history for context-aware follow-ups.

Quick Check for Validating Memory Persistence:

import json
from uuid import uuid4

thread_id = f"memory-test-{uuid4()}"

# --- 1. Use custom input instead of input_example ---
custom_input = {
"input": [{"role": "user", "content": "What are the main components of Orion?"}],
"custom_inputs": {"thread_id": thread_id},
}

print("=" * 50)
print("TEST 1: Custom input (no input_example needed)")
print("=" * 50)

# Use output_path to capture results (without it, mlflow.models.predict returns None)
mlflow.models.predict(
model_uri=model_uri,
input_data=custom_input,
env_manager="uv",
output_path="/tmp/result_1.json",
)

with open("/tmp/result_1.json", "r") as f:
result_1 = json.load(f)

thread_id_1 = result_1["custom_outputs"]["thread_id"]
response_1 = result_1["output"][0]["content"][0]["text"]
print(f"Thread ID: {thread_id_1}")
print(f"Response: {response_1[:300]}...")

# --- 2. Test memory persistence with same thread_id ---
follow_up_input = {
"input": [
{
"role": "user",
"content": "Can you elaborate more on the first component you mentioned?",
}
],
"custom_inputs": {"thread_id": thread_id}, # Reuse same thread
}

print("\n" + "=" * 50)
print("TEST 2: Follow-up on same thread (memory test)")
print("=" * 50)

mlflow.models.predict(
model_uri=model_uri,
input_data=follow_up_input,
env_manager="uv",
output_path="/tmp/result_2.json",
)

with open("/tmp/result_2.json", "r") as f:
result_2 = json.load(f)

thread_id_2 = result_2["custom_outputs"]["thread_id"]
response_2 = result_2["output"][0]["content"][0]["text"]
print(f"Thread ID: {thread_id_2}")
print(f"Response: {response_2[:300]}...")

# --- 3. Verify memory with actual conditions ---
print("\n" + "=" * 50)
print("MEMORY CHECK")
print("=" * 50)

# Check 1: Thread IDs match
if thread_id_1 == thread_id_2:
print(f"✅ Thread ID match: {thread_id_1}")
else:
print(f"❌ Thread ID mismatch! Call 1: {thread_id_1}, Call 2: {thread_id_2}")

# Check 2: Follow-up response references context from the first response
follow_up_lower = response_2.lower()
if len(response_2) > 50 and any(
keyword in follow_up_lower
for keyword in [
"motion",
"vision",
"cognition",
"communication",
"subsystem",
"component",
]
):
print(
"✅ Follow-up response references components from the first answer — memory is intact!"
)
else:
print(
"⚠️ Follow-up response may not reference the first answer. Manual review recommended."
)
print(f" Follow-up preview: {response_2[:200]}")

print(
"\n✅ Lakebase Postgres checkpointing is working correctly!"
if thread_id_1 == thread_id_2
else "\n❌ Memory persistence test FAILED."
)

You can see memory persistence working.

Screenshots Of Lakebase Postgres CheckPoint Tables:

Gotchas I Hit Along the Way

1. Lakebase Autoscaling vs Provisioned — completely different APIs. If you create a Lakebase project via the UI today, you get an Autoscaling project (the new default since March 2026). But most notebook examples in the docs use w.database.* — that's the Provisioned API. For Autoscaling, use w.postgres.*. I burned 30 minutes on NotFound: Resource not found before figuring this out.

2. PostgresSaver.from_conn_string() returns a context manager, not an instance. If you try checkpointer = PostgresSaver.from_conn_string(uri) and then call .setup(), you get AttributeError: '_GeneratorContextManager' object has no attribute 'setup'. The fix: use psycopg.connect() directly with autocommit=True and row_factory=dict_row, then pass the connection to PostgresSaver(conn=conn).

3. URL-encode your username. Databricks emails contain @, which breaks PostgreSQL connection URIs. Always use urllib.parse.quote(username, safe='').

4. The ep.host path is wrong. When inspecting Lakebase endpoint objects from the SDK, the host is at ep.status.hosts.host, not ep.host or ep.hosts.host.

5. OAuth tokens expire after 60 minutes. w.postgres.generate_database_credential() returns a token that's valid for one hour. For long-running endpoints, implement token rotation or regenerate on each agent initialization.

What Changed from a Standard RAG Agent

The entire modification boils down to a small set of surgical changes:

Component Standard RAG Context-Aware RAG Checkpointer InMemorySaver() PostgresSaver(conn=conn) thread_id Random UUID per request Stable ID from custom_inputs custom_outputs Pass-through Returns thread_id for session continuity Config LLM + Vector Search + Lakebase host, endpoint, user Dependencies Base LangChain + langgraph-checkpoint-postgres, psycopg[binary]

Everything else — parsing, chunking, Vector Search, system prompt, tool wiring, MLflow logging, deployment — stays identical.

Wrapping Up

The difference between a RAG demo and a production RAG system often comes down to state management. Databricks now provides every piece of this puzzle natively: ai_parse_document() (version 2.0) for multimodal PDF parsing, Vector Search with managed GTE-Large embeddings, Lakebase for conversation persistence, and the Agent Framework for deployment.

The key architectural decision is using Lakebase for conversation state while keeping Vector Search for document retrieval — each optimized for what it does best. Lakebase gives you a fully managed Postgres with scale-to-zero, so you’re only paying for compute when your agent is actively handling conversations.

If you’re building RAG on Databricks and your agent can’t handle “What did I just ask?”, this is the fix.

If you found this walkthrough useful, connect with me on LinkedIn or follow on Medium— I regularly publish deep-dives on Databricks, Lakehouse architecture, Data Engineering patterns and AI Agents. I’m always happy to discuss the real-world tradeoffs behind these decisions.

References:

#Databricks #RAG #Lakebase #PostgreSQL #GenAI #LangChain #VectorSearch #DataEngineering #AI


Your RAG Agent Forgets Everything After One Message - Here’s How I Fixed It with Databricks… was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top