• Home
  • Categories
  • Tags
  • Pricing
  • Submit
    Decorative pattern
    1. Home
    2. Concepts & Definitions
    3. BM25

    BM25

    Best Matching 25 ranking function for information retrieval that ranks documents based on query term frequency with length normalization. Core component of hybrid search RAG systems combining keyword and semantic search.

    🌐Visit Website

    About this tool

    Overview

    BM25 (Best Matching 25) is a probabilistic ranking function used in information retrieval that has become a cornerstone of hybrid search systems in RAG applications. It ranks documents based on query term frequency with sophisticated normalization.

    How It Works

    BM25 scoring formula considers:

    • Term Frequency (TF): How often query terms appear in documents
    • Inverse Document Frequency (IDF): Rarity of terms across corpus
    • Document Length Normalization: Prevents bias toward longer documents

    In Hybrid RAG Systems (2026)

    Hybrid search addresses limitations of single-strategy retrieval by combining:

    • Dense vector embeddings (semantic meaning)
    • Sparse keyword-based retrieval like BM25 (exact term matching)
    • Graph-based traversal (structural relationships)

    Implementation Pattern

    Run dense and BM25 queries in parallel, fuse ranked lists via RRF (Reciprocal Rank Fusion), then apply cross-encoder or ColBERT re-ranking over the fused top-k (typically k=50-200).

    Performance Benefits

    Combining BM25 full-text search with vector search significantly improves nDCG gains over pure vector search.

    Use Cases

    • Hybrid RAG systems
    • Customer support knowledge bases
    • Document search where exact term matching matters
    • Legal and medical document retrieval
    • Systems requiring explainable relevance

    Vector Database Support

    Most modern vector databases support BM25:

    • Elasticsearch
    • OpenSearch
    • Weaviate
    • Qdrant
    • Vespa
    • Typesense
    Surveys

    Loading more......

    Information

    Websiteen.wikipedia.org
    PublishedMar 11, 2026

    Categories

    1 Item
    Concepts & Definitions

    Tags

    3 Items
    #Information Retrieval#Ranking#Keyword Search

    Similar Products

    6 result(s)
    BM25 (Okapi BM25)

    Probabilistic ranking function for estimating document relevance to search queries. Industry standard for keyword search, combining term frequency, rarity, and length normalization into a single scoring model.

    Hybrid Search with Reciprocal Rank Fusion

    Search technique combining BM25 lexical search and semantic vector search using Reciprocal Rank Fusion (RRF) to merge results, balancing precision of keyword matching with contextual understanding of neural embeddings.

    MaxSim Operator

    Scoring function used in late interaction models like ColBERT that computes query-document relevance by finding maximum similarity between each query token and document tokens, then summing.

    Reciprocal Rank Fusion (RRF)

    Hybrid search algorithm combining results from multiple ranking systems by computing reciprocal ranks, commonly used to merge dense vector search with sparse keyword search for improved retrieval.

    MaxSim

    Maximum Similarity late interaction function introduced by ColBERT for ranking. Calculates cosine similarity between query and document token embeddings, keeping maximum score per query token for highly effective long-document retrieval.

    Reciprocal Rank Fusion

    Method for combining ranked lists from multiple retrieval systems in hybrid search. Standard technique in RAG pipelines for fusing BM25 and dense vector results before reranking, creating diverse high-confidence candidate sets.

    Decorative pattern
    Built with
    Ever Works
    Ever Works

    Connect with us

    Stay Updated

    Get the latest updates and exclusive content delivered to your inbox.

    Product

    • Categories
    • Tags
    • Pricing
    • Help

    Clients

    • Sign In
    • Register
    • Forgot password?

    Company

    • About Us
    • Admin
    • Sitemap

    Resources

    • Blog
    • Submit
    • API Documentation
    All product names, logos, and brands are the property of their respective owners. All company, product, and service names used in this repository, related repositories, and associated websites are for identification purposes only. The use of these names, logos, and brands does not imply endorsement, affiliation, or sponsorship. This directory may include content generated by artificial intelligence.
    Copyright © 2025 Awesome Vector Databases. All rights reserved.·Terms of Service·Privacy Policy·Cookies