
BM25
Best Matching 25 ranking function for information retrieval that ranks documents based on query term frequency with length normalization. Core component of hybrid search RAG systems combining keyword and semantic search.
About this tool
Overview
BM25 (Best Matching 25) is a probabilistic ranking function used in information retrieval that has become a cornerstone of hybrid search systems in RAG applications. It ranks documents based on query term frequency with sophisticated normalization.
How It Works
BM25 scoring formula considers:
- Term Frequency (TF): How often query terms appear in documents
- Inverse Document Frequency (IDF): Rarity of terms across corpus
- Document Length Normalization: Prevents bias toward longer documents
In Hybrid RAG Systems (2026)
Hybrid search addresses limitations of single-strategy retrieval by combining:
- Dense vector embeddings (semantic meaning)
- Sparse keyword-based retrieval like BM25 (exact term matching)
- Graph-based traversal (structural relationships)
Implementation Pattern
Run dense and BM25 queries in parallel, fuse ranked lists via RRF (Reciprocal Rank Fusion), then apply cross-encoder or ColBERT re-ranking over the fused top-k (typically k=50-200).
Performance Benefits
Combining BM25 full-text search with vector search significantly improves nDCG gains over pure vector search.
Use Cases
- Hybrid RAG systems
- Customer support knowledge bases
- Document search where exact term matching matters
- Legal and medical document retrieval
- Systems requiring explainable relevance
Vector Database Support
Most modern vector databases support BM25:
- Elasticsearch
- OpenSearch
- Weaviate
- Qdrant
- Vespa
- Typesense
Loading more......
