



A two-stage retrieval process where initial candidates from vector search are reordered using more sophisticated models like cross-encoders. Reranking significantly improves result quality by applying computationally expensive models to a small set of candidates, commonly used in RAG systems and search applications.
Loading more......
Reranking is a two-stage retrieval approach where:
Query: "How do transformers work?"
1. Initial Retrieval (bi-encoder):
- Retrieve top-100 candidates
- Fast (~10ms)
2. Reranking (cross-encoder):
- Score each of 100 candidates
- Reorder by relevance
- Return top-10
- Slower (~1s for 100 docs)
3. Final Results:
- Highest quality top-10 results
- Much better than initial retrieval alone
Typical improvements:
# 1. Initial retrieval
candidates = vector_db.search(query_embedding, top_k=100)
# 2. Reranking
reranked = reranker.rank(
query=query,
documents=[c.text for c in candidates],
top_k=10
)
# 3. Use top results
context = reranked[:5]
Supported by:
Varies by reranker API or self-hosted compute costs.