/ Home
ReRanking Algorithms
Advanced Re-Ranking Techniques in Modern Retrieval Systems
Re-ranking is the second-stage optimization layer in multi-stage retrieval pipelines (e.g., BM25 → ANN → Re-ranker → LLM).
Its purpose is to improve precision@k, nDCG, MRR, and reduce hallucination risk in RAG systems.
1. MMR (Maximal Marginal Relevance)
Concept
Maximal Marginal Relevance (MMR) balances:
- Query relevance
- Inter-document diversity
It prevents near-duplicate documents from dominating top-k results.
Formula
[ MMR = \arg\max_{D_i \in R \setminus S} \left[ \lambda \cdot Sim(D_i, Q) - (1-\lambda) \cdot \max_{D_j \in S} Sim(D_i, D_j) \right] ]
Where:
Q= queryR= candidate setS= selected documentsλ= relevance/diversity tradeoff (0–1)
Characteristics
- Lightweight (no model inference required)
- Works on embedding similarity
- Improves contextual coverage in RAG
Limitations
- Greedy heuristic
- Not learned
- Pairwise only (no global structure)
2. Cross-Encoder Re-Ranking (Neural Rerankers)
Concept
Unlike bi-encoders (separate embeddings), cross-encoders process: [CLS] Query [SEP] Document in a single transformer forward pass.
Architecture
- Typically BERT / RoBERTa style encoder
- Classification head outputs relevance score
Strengths
- Superior ranking quality
- Captures fine-grained token interactions
- Strong performance on MS MARCO benchmarks
Weaknesses
- Expensive (O(k) forward passes)
- Not scalable for large candidate pools
Common Models
cross-encoder/ms-marco-MiniLM-L-6-v2- MonoBERT
- MonoT5
Typical Pipeline
Stage 1: Retrieve top-100 via BM25/ANN
Stage 2: Re-rank top-100 via cross-encoder
Stage 3: Select top-5 for LLM context
3. Neural Diversity Rerankers (Beyond MMR)
Concept
Learned models that optimize both:
- Relevance
- Diversity
- Subtopic coverage
Instead of heuristic diversity (MMR), these use neural learning.
Methods
Determinantal Point Processes (DPP)
Encourages diverse subset selection via determinant maximization.
xQuAD
Explicitly models query subtopics.
Neural Subtopic Modeling
Transformer-based diversity scoring.
Strengths
- Better subtopic coverage
- Useful for exploratory search
- Reduces redundancy in RAG
Limitations
- More complex
- Harder to train
- Requires labeled data with subtopics
4. Transformer-Based Generative Retrieval
(GenRE / SEAL / SPLADE / ColBERTv2)
These blur the boundary between retrieval and generation.
4.1 GenRE (Generative Retrieval)
Concept
Model directly generates document IDs given a query. Query → Transformer → DocID tokens
Pros
- No vector DB needed
- Fully differentiable
- End-to-end training
Cons
- Scaling issues
- Requires retraining for corpus updates
4.2 SEAL
SEAL encodes documents as token sequences and retrieves via generation.
- Jointly learns retrieval
- Handles long-tail better than sparse models
4.3 SPLADE
Sparse lexical expansion via transformer.
- Produces sparse vectors
- Retains inverted-index compatibility
- Strong BM25 replacement
Advantages:
- Efficient
- Hybrid lexical + semantic behavior
4.4 ColBERTv2
Late-interaction architecture:
- Query and doc encoded separately
- Token-level max-sim interaction
- Efficient approximate search
Advantages:
- Higher precision than bi-encoder
- More scalable than cross-encoder
5. Learned Global Reranking Frameworks
Concept
Instead of scoring independently, model:
- Entire candidate list
- Global ranking order
Techniques
Listwise Learning-to-Rank
- LambdaMART
- ListNet
- ListMLE
Transformer Listwise Models
- Encode entire document list jointly
- Optimize nDCG directly
Benefits
- Captures inter-document competition
- Better global ordering
- Optimizes ranking metrics explicitly
Challenges
- Computationally heavy
- Complex training pipeline
6. LLM-Centric Reranking
(In-Context / Prompt-Based Rerankers)
Concept
Use a Large Language Model to rank candidate documents via prompting.
Example:
Given query Q and documents D1…D5, rank them by relevance.
Methods
Score-based Prompting
LLM outputs relevance score.
Pairwise Comparison
LLM compares D1 vs D2 iteratively.
Chain-of-Thought Ranking
LLM explains ranking before output.
Strengths
- Strong reasoning capability
- No task-specific training required
- Adapts to domain via prompt engineering
Weaknesses
- Expensive
- Latency heavy
- Not deterministic
- Hard to calibrate scores
Comparative Summary
| Method | Learning-Based | Scalable | Diversity-Aware | Precision | Cost |
|---|---|---|---|---|---|
| MMR | ❌ | ✅ | ✅ | Medium | Low |
| Cross-Encoder | ✅ | ❌ | ❌ | High | High |
| Neural Diversity | ✅ | ⚠️ | ✅ | High | Medium |
| GenRE / SEAL | ✅ | ❌ | ❌ | High | High |
| SPLADE | ✅ | ✅ | ❌ | High | Medium |
| ColBERTv2 | ✅ | ✅ | ❌ | High | Medium |
| Global Listwise | ✅ | ❌ | ⚠️ | Very High | High |
| LLM Reranking | ❌ (usually) | ❌ | ⚠️ | Very High | Very High |
When to Use What (RAG Engineering View)
- Small system, need diversity → MMR
- Production RAG, high precision → Cross-Encoder
- Hybrid lexical-semantic → SPLADE
- High-performance scalable retrieval → ColBERTv2
- Research / end-to-end retrieval → GenRE
- Multi-document reasoning → LLM reranking
- Search engine optimization → Listwise learning-to-rank
Final Engineering Principle
Retrieval quality impacts hallucination rate more than model size.
Re-ranking is not an optimization detail —
it is a core reliability mechanism in modern LLM systems.