Advanced RAG Techniques
Basic RAG works well, but advanced techniques can significantly improve accuracy, relevance, and user experience. Here are the cutting-edge approaches.
RAG Variations
1. Naive RAG
Basic: Retrieve → Augment → Generate. Simple but effective baseline.
Use when: Simple Q&A, small knowledge base
2. Sentence-Window RAG
Retrieve small chunks, but provide larger context window to LLM.
Use when: Need precise retrieval + broader context
3. Auto-Merging RAG
Hierarchical chunks. Merge child chunks if multiple from same parent retrieved.
Use when: Long documents, need coherent sections
4. Fusion RAG
Generate multiple query variations, retrieve for each, fuse results.
Use when: Complex queries, need comprehensive coverage
Query Transformation
Query Rewriting
Use LLM to rephrase query for better retrieval.
"How do I...?" → "Steps to..."HyDE
Generate hypothetical document, use it for retrieval.
Query → LLM generates answer → Embed → SearchRe-ranking
After initial retrieval, use a specialized model to re-rank results for better relevance.
- Retrieve top-100 candidates (fast, approximate)
- Re-rank with cross-encoder (slow, accurate)
- Return top-5 to LLM
Popular: Cohere Rerank, bge-reranker
Agentic RAG
Combine RAG with agent capabilities. Agent decides when to retrieve, what to retrieve, and how to use it.
✓ Routing: Direct query to appropriate knowledge source
✓ Multi-step: Break complex query into sub-queries
✓ Self-correction: Retrieve again if answer is insufficient
Evaluation Metrics
- Retrieval Precision: % of retrieved docs that are relevant
- Retrieval Recall: % of relevant docs that were retrieved
- Answer Relevance: Does answer address the question?
- Faithfulness: Is answer grounded in retrieved context?
- Context Relevance: Is retrieved context actually useful?
Framework: Use LlamaIndex or LangChain for implementing these advanced RAG patterns.