BGE-M3

A versatile embedding model from BAAI that simultaneously supports dense retrieval, sparse retrieval, and multi-vector retrieval, with multilingual support for 100+ languages and multi-granularity processing from short sentences to 8192-token documents.

Visit Website

Overview

BGE-M3 stands for Multi-Functionality, Multi-Linguality, and Multi-Granularity. It is a groundbreaking embedding model that can simultaneously perform three common retrieval functionalities in a single model.

Three Retrieval Methods

1. Dense Retrieval

Uses the normalized hidden state of the [CLS] token as the dense embedding for semantic similarity search.

2. Sparse Retrieval

Generates sparse vectors (vocabulary-sized with mostly zeros) calculating weights only for tokens present in the text, similar to BM25 but learned.

3. Multi-Vector Retrieval (ColBERT-style)

Uses multiple vectors to represent text, enabling fine-grained similarity matching at the token level.

Key Features

Multilingual: Supports 100+ languages
Multi-Granularity: Handles inputs from short sentences to long documents (up to 8192 tokens)
Hybrid Ranking: Combines multiple retrieval methods for improved accuracy
Self-Knowledge Distillation: Trained using advanced distillation techniques

Recommended Pipeline

The optimal setup is hybrid retrieval + re-ranking:

Use dense or sparse retrieval to get candidate results
Rerank using weighted combination of dense, sparse, and multi-vector scores
Apply BGE reranker for final ranking

Performance

On the MLDR test set (13-language long document retrieval):

Sparse retrieval achieved ~10 NDCG@10 points higher than dense mode
Dense+sparse hybrid provided further gains
Strong performance across diverse benchmarks

Platform Support

Hugging Face Transformers
NVIDIA NIM
DeepInfra
Ollama
Vespa and Milvus for hybrid retrieval

Use Cases

Multilingual semantic search
Hybrid search combining keyword and semantic matching
Long document retrieval
Cross-lingual information retrieval
RAG applications requiring multiple retrieval strategies

Pricing

Free and open-source. Available through various commercial API providers with usage-based pricing.

Surveys

Loading more......

Information

Websitehuggingface.co

PublishedMar 20, 2026

Tags

3 Items

#embedding-model #hybrid-search #multilingual

Similar Products

Jina-CLIP v2

A 0.9B multimodal embedding model with multilingual support for 89 languages, 512x512 image resolution, and Matryoshka representations that enable dimensional flexibility from 1024 down to 64 dimensions while maintaining strong performance.

000

BGE-M3

A versatile multilingual text embedding model from BAAI that supports 100+ languages and can handle inputs up to 8192 tokens. BGE-M3 is unique in supporting three retrieval methods simultaneously: dense retrieval, multi-vector retrieval, and sparse retrieval.

000

Elasticsearch Vector Search

Lucene KNN vector plugin for Elasticsearch search engine, enabling hybrid lexical+vector search, BM25 fusion, HNSW/IVF indexes for ANN. Used for enterprise search, RAG, multimodal apps. Integrated vs standalone like Weaviate: superior hybrid text handling but higher resource footprint.

000

Cohere Rerank v3.5

State-of-the-art foundational model for ranking with 4096 context length and multilingual support for 100+ languages. Offers exceptional performance on BEIR benchmarks and specialized domains including finance, e-commerce, and enterprise search.

000

Cascading Retrieval

Advanced retrieval approach combining dense vectors, sparse vectors, and reranking in a multi-stage pipeline, achieving up to 48% better performance than single-method retrieval.

000

Qwen3 Embedding

Multilingual embedding model supporting over 100 languages and ranking #1 on MTEB multilingual leaderboard. Offers flexible model sizes from 0.6B to 8B parameters with user-defined instructions.

000

Overview

Three Retrieval Methods

1. Dense Retrieval

Uses the normalized hidden state of the [CLS] token as the dense embedding for semantic similarity search.

2. Sparse Retrieval

Generates sparse vectors (vocabulary-sized with mostly zeros) calculating weights only for tokens present in the text, similar to BM25 but learned.

3. Multi-Vector Retrieval (ColBERT-style)

Uses multiple vectors to represent text, enabling fine-grained similarity matching at the token level.

Key Features

Multilingual: Supports 100+ languages
Multi-Granularity: Handles inputs from short sentences to long documents (up to 8192 tokens)
Hybrid Ranking: Combines multiple retrieval methods for improved accuracy
Self-Knowledge Distillation: Trained using advanced distillation techniques

Recommended Pipeline

The optimal setup is hybrid retrieval + re-ranking:

Use dense or sparse retrieval to get candidate results
Rerank using weighted combination of dense, sparse, and multi-vector scores
Apply BGE reranker for final ranking

Performance

On the MLDR test set (13-language long document retrieval):

Sparse retrieval achieved ~10 NDCG@10 points higher than dense mode
Dense+sparse hybrid provided further gains
Strong performance across diverse benchmarks

Platform Support

Hugging Face Transformers
NVIDIA NIM
DeepInfra
Ollama
Vespa and Milvus for hybrid retrieval

Use Cases

Multilingual semantic search
Hybrid search combining keyword and semantic matching
Long document retrieval
Cross-lingual information retrieval
RAG applications requiring multiple retrieval strategies

Pricing

Free and open-source. Available through various commercial API providers with usage-based pricing.

BGE-M3

Overview

Three Retrieval Methods

1. Dense Retrieval

2. Sparse Retrieval

3. Multi-Vector Retrieval (ColBERT-style)

Key Features

Recommended Pipeline

Performance

Platform Support

Use Cases

Pricing

Information

Categories

Tags

Similar Products

Connect with us

Stay Updated

Product

Clients

Company

Resources

BGE-M3

Overview

Three Retrieval Methods

1. Dense Retrieval

2. Sparse Retrieval

3. Multi-Vector Retrieval (ColBERT-style)

Key Features

Recommended Pipeline

Performance

Platform Support

Use Cases

Pricing

Information

Categories

Tags

Similar Products