Am I misunderstanding RAG? I thought it basically meant separate retrieval + generation

Disclaimer: sorry if this post comes out weirdly worded, English is not my main language.

I’m a bit confused by how people use the term RAG.

I thought the basic idea was:

  • use an embedding model / retriever to find relevant chunks
  • maybe rerank them
  • pass those chunks into the main LLM
  • let the LLM generate the final answer

So in my head, RAG is mostly about having a retrieval component and a generator component, often with different models doing different jobs.

But then I see people talk about RAG as if it also implies extra steps like summarization, compression, query rewriting, context fusion, etc.

So what’s the practical definition people here use?

Is “normal RAG” basically just:
retrieve --> rerank --> stuff chunks into prompt --> answer

And are the other things just enhancements on top?

Also, if a model just searches the web or calls tools, does that count as RAG too, or not really?

Curious what people who actually build local setups consider the real baseline.

submitted by /u/shironekoooo
[link] [comments]

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top