MachineLearning

Codebase-scale retrieval using AST-derived graphs + BM25 — reducing LLM context from 100K to 5K tokens [D]

Wanted to share an approach I've been using for retrieval-augmented generation over large codebases and get feedback from people thinking about similar problems. The problem Naive codebase RAG typically works by chunking files into text segments an…