What’s actually inside 1,259 hours of AI safety podcasts?

I've spent the last year building a tool that indexes AI safety and alignment podcast conversations at the idea level. The corpus now covers 392 episodes, 1,259 hours, and 75,534 searchable moments across Dwarkesh Patel, Lex Fridman, 80,000 Hours, AXRP, The Inside View, AI Safety Talks, Future of Life Institute, and Robert Miles AI Safety.

Below are a few things I found interesting when running searches across the full corpus. Every claim links directly to the search or moment in question.

Deceptive alignment dominates the conversation

Searching deceptive alignment (https://leita.io?q=deceptive+alignment) returns 8 relevant moments across 4 podcasts. Evan Hubinger appears twice — once in his AI Safety Talks lecture at 1:26:40 on how a deceptively aligned model figures out the training objective in its own activations, and once at 2:40 using the analogy of hiring someone to run a business you don't understand. Victoria Krakovna on The Inside View adds an important dimension at 1:33:20: we don't have a good solution for goal misgeneralization yet, and increasing training data diversity only partially addresses it.

Mesa-optimization (https://leita.io?q=mesa-optimization), by contrast, surfaces mostly in technical contexts — Evan Hubinger on AXRP at 35:20 walking through exactly what makes something a mesa-optimizer, and Robert Miles at 22:00 explaining that the optimal strategy for a mesa-optimizer is to appear aligned during training.

The pattern: deceptive alignment has become the operationally useful framing — the failure mode people are actually building evals around. Mesa-optimization lives more in the theoretical background.

The Christiano-Yudkowsky disagreement is about verifiability, not just p(doom)

Searching Christiano Yudkowsky (https://leita.io?q=Christiano+Yudkowsky), the most relevant moment is Yudkowsky at 1:14:40 on Dwarkesh: "I say stuff. Paul Christiano says stuff. People argue about it. They can't figure out who's right." The surrounding context is specifically about whether alignment solutions can be verified before deployment — that's the deeper crux, not the extinction probability numbers.

Scott Aaronson on AXRP at 1:58:40 adds another angle: he mentions having taught Christiano, and frames part of the disagreement as being about how much weight to give formal versus empirical approaches.

Corrigibility keeps coming up in unexpected places

Searching corrigibility (https://leita.io?q=corrigibility), Victoria Krakovna on The Inside View at 15:20 gives a clean formulation: a system doesn't have to perfectly understand what we want — it just has to be trying to do the thing we want. Evan Hubinger at 1:04:00 in AI Safety Talks pushes back: corrigibility in the basic behavioral sense isn't enough.

These two moments sit well together as a compressed version of the debate.

Why I built this

I kept losing track of where specific arguments were made. I knew someone had explained eliciting latent knowledge (https://leita.io?q=eliciting+latent+knowledge) clearly somewhere — Vikrant Varma does it at 2:02:00 on AXRP: figuring out how easy it is to extract truth-like things from models. That's the kind of moment this tool is built to surface.

Spotify searches episode titles. Listen Notes searches descriptions. This searches what's actually being said — semantically, so it understands meaning rather than just matching keywords.

I have no background in software development. I built this entirely with AI tools over the last year.

The tool is free, no login required: https://leita.io

A few more searches worth trying:

scaling hypothesis (https://leita.io?q=scaling+hypothesis) — Demis Hassabis and Dario Amodei both address this directly

sharp left turn (https://leita.io?q=sharp+left+turn)

Neil Lawrence (https://leita.io?q=Neil+Lawrence)

Just added this based on community feedback "what Yudkowsky thinks about Paul's approach" (https://leita.io?q=what+Yudkowsky+thinks+about+Paul)

I'm posting here because I want feedback from people who know this content well enough to tell me when the search gets it wrong. Happy to answer questions in the comments.

Discuss

Leave a Comment