Artificial Intelligence, data-science, llm, Machine Learning, naturallanguageprocessing

I Built a RAG Evaluation Framework from Scratch. Here’s What Broke It.

30 aviation safety reports. 150 evaluation questions. Three experiments. And one surprisingly humbling detour with a broken ruler.Continue reading on Medium »