
BM42
Experimental sparse embedding approach combining exact keyword search with transformer intelligence, integrating sparse and dense vector searches for improved RAG results, developed by Qdrant.
About this tool
Overview
BM42 is a new sparse embedding approach that combines the benefits of exact keyword search with the intelligence of transformers. It was developed by Qdrant as a search algorithm combining vector and standard BM25 keyword search methods to get better RAG results.
Key Features
Hybrid Search Approach
At the core of BM42's innovation is its hybrid search capability, which seamlessly integrates both sparse and dense vector searches:
- Sparse vector handles exact term matching
- Dense vectors handle semantic relevance and deep meaning
Technical Innovation
As a sparse search technique, it retains the inverse document frequency (IDF) aspect of BM25, equipping BM42 with the core ability to capture rare and out-of-vocabulary terms. The key innovation lies in how it defines token-level relevance within documents.
Transformer Integration
BM42 reverses the tokenization process after getting the attention vectors, and the attention weights of subwords can be summed to get the attention weight of the word.
Important Considerations
Experimental Status: Recent evaluations have raised questions about the validity of BM42, and future developments may address these concerns. BM42 does not outperform BM25 implementation of other vendors and should be considered as an experimental approach which requires further research and development before it can be used in production.
Implementation
Starting from Qdrant v1.10.0, BM42 can be used in Qdrant via FastEmbed inference.
Use Cases
- Research and experimentation with hybrid search
- Development of new sparse retrieval methods
- Evaluation of sparse-dense search combinations
Pricing
Free to use as part of Qdrant.
Loading more......
Information
Categories
Tags
Similar Products
6 result(s)