Advanced RAG Techniques

Basic RAG works well, but advanced techniques can significantly improve accuracy, relevance, and user experience. Here are the cutting-edge approaches.

RAG Variations

1. Naive RAG

Basic: Retrieve → Augment → Generate. Simple but effective baseline.

Use when: Simple Q&A, small knowledge base

2. Sentence-Window RAG

Retrieve small chunks, but provide larger context window to LLM.

Use when: Need precise retrieval + broader context

3. Auto-Merging RAG

Hierarchical chunks. Merge child chunks if multiple from same parent retrieved.

Use when: Long documents, need coherent sections

4. Fusion RAG

Generate multiple query variations, retrieve for each, fuse results.

Use when: Complex queries, need comprehensive coverage

Query Transformation

Query Rewriting

Use LLM to rephrase query for better retrieval.

"How do I...?" → "Steps to..."

HyDE

Generate hypothetical document, use it for retrieval.

Query → LLM generates answer → Embed → Search

Re-ranking

After initial retrieval, use a specialized model to re-rank results for better relevance.

  1. Retrieve top-100 candidates (fast, approximate)
  2. Re-rank with cross-encoder (slow, accurate)
  3. Return top-5 to LLM

Popular: Cohere Rerank, bge-reranker

Agentic RAG

Combine RAG with agent capabilities. Agent decides when to retrieve, what to retrieve, and how to use it.

Routing: Direct query to appropriate knowledge source

Multi-step: Break complex query into sub-queries

Self-correction: Retrieve again if answer is insufficient

Evaluation Metrics

  • Retrieval Precision: % of retrieved docs that are relevant
  • Retrieval Recall: % of relevant docs that were retrieved
  • Answer Relevance: Does answer address the question?
  • Faithfulness: Is answer grounded in retrieved context?
  • Context Relevance: Is retrieved context actually useful?

Framework: Use LlamaIndex or LangChain for implementing these advanced RAG patterns.