I Built an AI Podcast That Learns What You Like — Here’s Exactly How It Works

PodcastBrain AI podcast generator — two agent orbs conversing around a microphone, powered by Agno and Google Gemini

Two AI agents debate any topic you choose. One plays host, one plays expert. And the whole system remembers your taste.

I got tired of AI demos that do one thing impressively and nothing else.

Most “multi-agent” projects I’ve seen are basically two LLM calls dressed up in a flowchart. Real multi-agent behavior — where agents have distinct personalities, reason step-by-step, and actually adapt based on what you’ve told them you like — is genuinely hard to build well.

So I built PodcastBrain: a fully local AI podcast generator where a host agent and an expert agent debate any topic you give them, their conversation gets synthesized into audio via ElevenLabs with two distinct voices, and the whole system builds a preference model over time based on your ratings. Rate an episode 👍 or 👎, tell it your knowledge level and preferred depth, and the next episode shifts to match.

This post is the full technical breakdown — architecture, agent design decisions, the memory system, and the things that didn’t work before the things that did.

Quick Answer: PodcastBrain is an open-source Python app that uses two Agno-powered AI agents (Host + Expert) running on Google Gemini 2.5 Flash to generate dynamic podcast conversations on any topic. It features ElevenLabs TTS for dual-voice audio, SQLite-based preference learning, and a Streamlit UI — all runnable locally for free on the Gemini free tier.

Last updated: March 2026 | Stack: Agno 2.2+, Gemini 2.5 Flash, ElevenLabs, Streamlit, SQLite

What Is PodcastBrain?
The Architecture: How Three Agents Divide the Work
Why Agno? The Multi-Agent Framework Decision
The Learning System: How PodcastBrain Gets Smarter
Voice Output: Making Two AI Agents Sound Like Two Different People
The Parts That Were Hard
How to Run It Yourself (Step-by-Step)
What I’d Build Next
FAQ
Key Takeaways

What Is PodcastBrain?

PodcastBrain is a multi-agent AI system that generates podcast episodes on demand — any topic, any depth, personalized to your taste.

Here’s what happens when you use it:

You type a topic into the Streamlit sidebar (“the future of AI agents”, “why sleep debt is real”, “the economics of open-source software”)
Two AI agents — Alex (the host) and Dr. Sam (the expert) — have a real back-and-forth conversation about it, turn by turn
A third agent (the Summarizer) generates a structured episode summary with key takeaways
If you’ve enabled ElevenLabs, the transcript gets synthesized into audio with two distinct voices — one per speaker — merged into a single MP3
You rate the episode thumbs up or down, and that rating shapes every future episode

The agents aren’t just taking turns completing a template. Alex reads the listener’s preference profile before each episode and adapts — pushing back harder if you like debate, keeping it accessible if you’ve set knowledge level to beginner, going deeper into the weeds if you’ve set preferred depth to “deep-dive.”

This isn’t a gimmick. It’s a working demonstration of how preference learning, agentic memory, and structured multi-agent orchestration come together in a single coherent app. And it runs for free on Gemini’s free tier if you choose the right model.

Key Insight: The interesting engineering isn’t the podcast format — it’s the feedback loop. Most AI apps are stateless. PodcastBrain gets measurably better at serving you the more you use it.

The Architecture: How Three Agents Divide the Work

PodcastBrain multi-agent architecture diagram showing Host Agent, Expert Agent, and Summarizer Agent connected via Agno framework with SQLite and ElevenLabs output

Let’s start with the full picture, then break down each piece.

User picks a topic
       │
       ▼
Podcast Orchestrator (podcast.py)
       │
   ┌───┴───────────────────┐
   ▼                       ▼
Host Agent (Alex)    Expert Agent (Dr. Sam)
• Sharp questions    • Deep insights
• Reacts naturally   • Reasoning Tools
• 1–3 sentences      • 2–4 sentences
• Agno memory        • Agno memory
   │                       │
   └────────┬──────────────┘
            ▼
     Turn-by-Turn Transcript
            │
            ▼
     Summarizer Agent
     (Pydantic structured output)
            │
     ┌──────┴──────┐
     ▼             ▼
  SQLite DB    ElevenLabs TTS
  (episodes,   (dual-voice MP3)
   ratings,
   prefs)

Three agents. Three jobs. Zero overlap.

This separation of concerns is the most important architectural decision in the whole project. The Host and Expert agents are optimized for conversation quality. The Summarizer agent is optimized for structured data extraction. Mixing these responsibilities into one agent would mean compromising all three jobs.

Agent 1: The Host (Alex)

Alex is designed to feel like a real podcast host — someone who asks sharp questions, reacts with genuine curiosity, and keeps the conversation moving without letting it sprawl.

The key constraint on Alex’s instructions is one I had to add through iteration: “BE BRIEF: 1–3 short sentences ONLY.”

This sounds obvious. It isn’t. Without an explicit brevity constraint, Gemini’s default behavior is to give thorough, comprehensive answers — which is exactly wrong for a podcast host. A host who says four paragraphs between questions kills the conversational rhythm. The explicit constraint forces the model to make editorial choices about what’s worth saying, which produces better podcast dynamics than leaving it unconstrained.

Alex also reads the listener preference context before each episode:

def _build_preference_context() -> str:
    prefs = get_all_preferences()
    liked = get_liked_topics()
    past = get_past_topics()
    lines = []
    if liked:
        lines.append(f"Topics the listener has enjoyed before: {', '.join(liked)}")
    if prefs.get("knowledge_level"):
        lines.append(f"Listener's knowledge level: {prefs['knowledge_level']}")
    if prefs.get("preferred_depth"):
        lines.append(f"Preferred depth: {prefs['preferred_depth']}")
    if prefs.get("preferred_tone"):
        lines.append(f"Preferred tone: {prefs['preferred_tone']}")
    return "\n".join(lines) if lines else "No listener preferences yet."

That context string gets injected into Alex’s system prompt as a --- Listener --- section. So if you've rated three AI episodes thumbs up and set your knowledge level to "advanced," Alex knows to skip the definitional questions and go straight to the edge cases.

Agent 2: The Expert (Dr. Sam)

Sam is the expert guest — a domain authority who gives punchy, insight-dense answers with vivid analogies, can push back on Alex’s framing, and occasionally cracks a joke.

The key difference from the Host is that Sam uses ReasoningTools:

tools=[ReasoningTools(enable_think=True, enable_analyze=True)]

think() and analyze() let the Expert agent reason step-by-step before generating a response — an internal chain-of-thought that isn't shown to the user but improves the quality of the output. For factual or analytical topics, this makes a meaningful difference. Sam's answers are more coherent and less prone to confident-sounding wrong turns.

The brevity constraint on Sam is slightly looser than Alex’s (2–4 sentences vs 1–3), because expert insight benefits from a bit more space to develop. But the constraint still exists. Without it, the “expert” becomes a lecturer, and podcast flow dies.

Agent 3: The Summarizer

The Summarizer is the simplest agent and also the most precisely engineered. Its job is to take the full transcript and return a structured JSON object — not prose, not bullet points, a guaranteed Pydantic schema.

Agent(
    name="Summarizer",
    output_schema=EpisodeSummary,
    structured_outputs=True,
)

The EpisodeSummary Pydantic model (defined in models.py) ensures the database always gets clean, typed data regardless of what the conversation contained. This is how episode history, searchability, and the learning system stay reliable — the Summarizer enforces schema compliance so downstream code never has to guess at data shape.

The Orchestrator (podcast.py)

The orchestrator is the conductor. It runs the ping-pong loop: Host turn → Expert turn → Host turn → Expert turn, for however many turns the user configured. It handles rate limit retries with exponential backoff (important on Gemini’s free tier), collects the full transcript, then fires the Summarizer agent and the ElevenLabs audio pipeline in parallel.

Each 3-turn episode uses approximately 7 API calls — 1 per speaker per turn, plus the Summarizer. On the free tier of gemini-2.0-flash (1,500 requests/day), that's room for ~200 episodes per day. On gemini-2.5-flash (20 requests/day on the free tier), you get about 2–3 episodes before hitting limits. The README makes this clear and recommends switching models accordingly.

Key Insight: Agent specialization — one agent per job, not one agent for everything — is what makes this feel like a real multi-agent system rather than a prompt engineering trick.

Why Agno? The Multi-Agent Framework Decision

The project uses Agno as its multi-agent framework, and I want to explain that choice because it’s not obvious.

The landscape for Python multi-agent frameworks in 2026 includes LangGraph, CrewAI, AutoGen, and Agno, among others. Each makes different trade-offs.

LangGraph is powerful but verbose — you define explicit state machines and node transitions. That’s the right choice for complex production workflows. For a two-agent conversational loop, it’s overkill.

CrewAI has a nice declarative API but its memory system is coarser than what PodcastBrain needed.

Agno hit the right level of abstraction for this project for three specific reasons:

1. enable_agentic_memory=True — one line, persistent cross-session memory. Both the Host and Expert agents get this flag. It uses SqliteDb under the hood. The agents don't just remember within a single episode — they build up context across all episodes they've been part of. After ten episodes on AI topics, they start feeling like they have a relationship with the listener.

2. ReasoningTools — baked-in chain-of-thought without extra infrastructure. The think() and analyze() tools in Agno's ReasoningTools give the Expert agent an internal reasoning step that costs nothing extra architecturally. In other frameworks, implementing this cleanly requires custom tool definitions or prompt engineering hacks.

3. output_schema + structured_outputs=True — guaranteed Pydantic schema from any agent. The Summarizer needs to return reliable structured data, not markdown prose that I then parse. Agno's native structured output support makes the Summarizer a first-class typed data producer rather than a text scraper.

The Agent() constructor in Agno is clean enough that each agent definition is self-documenting. Looking at agents.py, you can understand what each agent does, what it knows, and how it persists state — in under 20 lines per agent.

The Learning System: How PodcastBrain Gets Smarter

This is the part I’m most interested in from a product perspective, because it’s where PodcastBrain differentiates from “AI generates podcast” demos.

The learning system has three components:

1. Episode Ratings

After each episode, you rate it 👍 or 👎. That rating gets stored in SQLite with the episode topic, timestamp, and a summary. Over time, this builds a record of what you’ve liked and what you haven’t.

2. Preference Persistence

In the sidebar, you set three dimensions of preference:

Setting Options Knowledge Level beginner / intermediate / advanced Preferred Depth surface / balanced / deep-dive Preferred Tone casual / conversational / academic

These get written to the preferences table in SQLite and persist across sessions. They're not inferred — you set them directly. This is intentional. Explicit preference signals are more reliable than trying to infer taste from rating data alone, especially early in the learning loop when you have few data points.

3. Dynamic Prompt Injection

Before every episode, _build_preference_context() queries the database for liked topics, recent episodes, and current preferences. It assembles these into a plain-text context block that gets injected into both the Host and Expert agent system prompts.

The result looks something like this in the agent’s context:

--- Listener ---
Topics the listener has enjoyed before: AI agents, quantum computing, sleep science
Recent episode topics: AI agent frameworks, autonomous vehicles
Listener's knowledge level: advanced
Preferred depth: deep-dive
Preferred tone: conversational

Both agents read this before generating their first turn. Alex adjusts her question style. Sam calibrates the depth of her analogies. The shift is real — an “advanced, deep-dive” listener gets a materially different episode on the same topic than a “beginner, surface” listener.

What the system doesn’t yet do: automatically infer preferences from ratings. That’s the next evolution — a fourth “Preference Learner” agent that periodically analyzes the ratings history and updates the preference profile without user intervention. The scaffolding for this (the ratings table, the preference table, the get_liked_topics() function) is all there. It's a few hundred lines of code from being a real recommendation engine.

Key Insight: Most AI apps treat every session as the first session. PodcastBrain’s learning system makes each episode informed by every episode that came before it — not because the LLM has memory, but because the application does.

Voice Output: Making Two AI Agents Sound Like Two Different People

The audio pipeline in audio.py is where PodcastBrain crosses from "demo" to "something you'd actually listen to."

The design challenge: ElevenLabs gives you a single TTS API call that returns one audio file. But a podcast has two speakers. You need to:

Split the transcript by speaker (Alex vs Sam)
Call ElevenLabs with a different voice ID per speaker
Generate individual audio clips per turn
Concatenate them in order into a single MP3

PodcastBrain handles steps 1–4 in audio.py. The orchestrator passes the full transcript (a list of {speaker, text} objects) to the audio module, which iterates through each turn, routes it to the correct ElevenLabs voice, and assembles the final MP3.

The voice distinction matters more than you’d think. When both speakers have the same synthetic voice, even a well-written conversation collapses into monotony. Two distinct voices — one warmer, one more precise — give the conversation the spatial quality that makes a podcast listenable rather than just readable.

If you don’t have an ElevenLabs API key, PodcastBrain runs in text-only mode. The full transcript still displays as chat bubbles in the Streamlit UI. The app is fully usable without audio — the voice layer is additive, not required.

The Parts That Were Hard

Building this taught me a few things that I didn’t expect to be problems.

Rate limits are a first-class design constraint, not an afterthought.

Gemini’s free tier limits on gemini-2.5-flash are brutal for multi-agent apps: 20 requests per day, 5 per minute. A single 5-turn podcast episode uses 11 API calls. Hit the rate limit mid-episode and you have a broken partial transcript. The orchestrator in podcast.py handles this with automatic retry and exponential backoff, but the bigger design insight is model selection: switching to gemini-2.0-flash (1,500 requests/day free) is the practical fix for most users.

Brevity constraints require explicit calibration.

The first version of the agents had no word count constraints. The Host gave 4-paragraph answers. The Expert turned into a Wikipedia article. Conversations that should take 3 minutes to read were taking 15. Adding explicit constraints (“1–3 short sentences ONLY. No paragraphs. Think Twitter, not essay.”) wasn’t sufficient on its own — I had to also add “Respond ONLY with your spoken words. No labels, no formatting, no stage directions.” Without that second constraint, agents would sometimes annotate their own responses (“Alex leans forward”) or prefix them with their name.

Structured output from LLMs needs schema enforcement at the framework level.

The early Summarizer was a regular agent that returned markdown. Parsing episode titles and key takeaways from markdown prose with regex is fragile and miserable. Switching to Agno’s output_schema=EpisodeSummary with structured_outputs=True eliminated an entire class of bugs. The Summarizer now returns a guaranteed Pydantic object every time, regardless of what the conversation contained.

Audio concatenation needs silence padding.

The first version of the audio pipeline concatenated clips with no gap. The result sounded like two people cutting each other off constantly. Adding 300ms of silence between speaker turns — a single short padding operation — made the conversation feel natural. This is one of those things that’s obvious in retrospect and invisible until you hear it.

How to Run It Yourself (Step-by-Step)

The full setup takes about 5 minutes.

Prerequisites

Python 3.10+
A Google AI Studio API key (free at aistudio.google.com)
An ElevenLabs API key (optional — for voice output)

Setup

# Clone the repo
git clone https://github.com/shubh-vedi/podcastbrain.git
cd podcastbrain

# Install dependencies
pip install -r requirements.txt

# Configure API keys
cp .env.example .env
# Edit .env and add your keys:
# GOOGLE_API_KEY=your-google-api-key
# ELEVENLABS_API_KEY=your-elevenlabs-key  (optional)

# Run
streamlit run app.py

Free Tier Recommendation

If you’re on Gemini’s free tier, open agents.py and change:

MODEL_ID = "gemini-2.5-flash"   # 20 req/day free

to:

MODEL_ID = "gemini-2.0-flash"   # 1500 req/day free

The quality difference is minimal for conversational content. The rate limit difference is enormous.

Sidebar Controls

Once running, configure these in the Streamlit sidebar before generating your first episode:

Topic: Anything. “The case against sleep deprivation.” “Why Rust is eating Python’s lunch.” “The science of habit formation.”
Turns: 3 turns (6 total exchanges) is the sweet spot for a ~5-minute episode. More turns = longer episode = more API calls.
Knowledge Level / Preferred Depth / Preferred Tone: Set these before your first episode. They immediately shape the conversation style.

What I’d Build Next

PodcastBrain is a working foundation. Here’s what the natural extensions look like:

Automatic preference inference. A fourth “Preference Learner” agent that runs after every 5 episodes, analyzes the ratings history, and updates the preference profile without requiring manual input. The database schema already supports this.

Guest persona library. Right now Dr. Sam is a generic expert. A persona library would let you pick “Dr. Sam as a skeptical economist” or “Dr. Sam as a Silicon Valley optimist” — giving the same topic radically different framing.

Topic suggestion engine. A lightweight recommender that proposes topics based on your liked episode history. Uses the same get_liked_topics() function that already feeds the preference context.

Web search grounding. Connecting the Expert agent to a web search tool would let it pull real-time data, cite recent papers, or reference current events — moving the conversations from generally informed to actually current.

RSS feed export. Generate a valid RSS feed from your episode history so you can subscribe to your own personalized podcast in any podcast app.

FAQ

What is PodcastBrain and how does it work?

PodcastBrain is an open-source multi-agent AI app that generates podcast episodes on any topic using two AI agents — a Host (Alex) and an Expert (Dr. Sam). The agents take turns in a structured conversation, a third Summarizer agent creates episode notes, and the system builds a listener preference model over time based on your ratings and settings. Built with Agno, Google Gemini, ElevenLabs, and Streamlit.

Is PodcastBrain free to use?

The software is free and open-source. API costs depend on your configuration. Google Gemini’s free tier on gemini-2.0-flash allows 1,500 requests per day — enough for approximately 200 episodes daily. ElevenLabs is optional; the app runs in text-only mode without an ElevenLabs key. If you use ElevenLabs' free tier (10,000 characters/month), expect roughly 15–20 short podcast episodes per month before hitting the limit.

What is Agno and why does PodcastBrain use it?

Agno is a Python multi-agent framework that provides clean agent definitions, built-in cross-session memory via SqliteDb, reasoning tools (think/analyze), and native structured output support via Pydantic schemas. PodcastBrain uses Agno because it hits the right abstraction level for a two-agent conversational system — more ergonomic than LangGraph for this use case, with better memory primitives than CrewAI.

How does the learning system work?

After each episode, you rate it 👍 or 👎. You also set explicit preferences for knowledge level, depth, and tone in the sidebar. All of this is stored in a local SQLite database. Before every new episode, both agents read a preference context block assembled from your ratings history and settings — so the conversation style adapts to your taste over time.

Can I run PodcastBrain without an ElevenLabs API key?

Yes. ElevenLabs is optional. Without it, PodcastBrain runs in text-only mode — the full transcript displays as a chat bubble UI in Streamlit, but no audio is generated. All other features (preference learning, episode history, ratings, structured summaries) work identically.

How is this different from just asking ChatGPT to write a podcast script?

Several meaningful ways. First, the multi-agent architecture means you have two distinct agent personalities with different instructions, tools, and roles — not one agent roleplaying two characters. Second, the preference learning system makes every episode informed by your history; ChatGPT has no memory of past sessions. Third, the structured output pipeline gives you a real database of episodes with searchable summaries and ratings. Fourth, the voice output with two distinct ElevenLabs voices is a fundamentally different experience from reading text.

What topics work best for PodcastBrain?

Topics with genuine intellectual tension work best — where there’s something for the expert to push back on, debate, or reframe. “The future of X” topics, comparisons (“Python vs Rust for data pipelines”), counterintuitive science (“why more sleep makes you more productive”), and controversial takes (“remote work is overrated”) all generate more dynamic conversations than purely informational topics.

Key Takeaways

Three-agent architecture with clear job separation — Host, Expert, and Summarizer each have a single job, which is why each one does its job well.
Brevity constraints are non-negotiable for conversational agents — explicit sentence count limits in system instructions are what separate podcast-quality dialogue from LLM essay mode.
Agno’s enable_agentic_memory=True + SqliteDb gives agents genuine cross-session persistence in a single line — this is the foundation of the learning system.
ReasoningTools on the Expert agent produces measurably more coherent analytical responses without any additional infrastructure.
Model selection on the free tier matters enormously — gemini-2.0-flash at 1,500 req/day is the practical choice; gemini-2.5-flash at 20 req/day will block you before your third episode.
The learning loop is the product — the podcast format is the delivery mechanism, but the preference model that improves over time is what makes this genuinely useful rather than just impressive.

Tried PodcastBrain? Tell me what topic you generated first — I’m genuinely curious what topics people reach for. Leave a comment or hit the clap button if this breakdown was useful. And if you build something on top of this, I want to see it.

GitHub: github.com/shubh-vedi/podcastbrain ( Dont Forget to Star the Repo)

About the Author

I build AI tools and write about what I learn building them agent frameworks, LLM orchestration patterns, and the gap between what AI demos promise and what production systems actually look like. Follow for weekly posts on generative AI, agent architecture, and the AI tools that are actually worth your time.

I Built an AI Podcast That Learns What You Like — Here’s Exactly How It Works was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.