BGE-M3

A versatile embedding model from BAAI that simultaneously supports dense retrieval, sparse retrieval, and multi-vector retrieval, with multilingual support for 100+ languages and multi-granularity processing from short sentences to 8192-token documents.

🌐Visit Website

About this tool

Overview

BGE-M3 stands for Multi-Functionality, Multi-Linguality, and Multi-Granularity. It is a groundbreaking embedding model that can simultaneously perform three common retrieval functionalities in a single model.

Three Retrieval Methods

1. Dense Retrieval

Uses the normalized hidden state of the [CLS] token as the dense embedding for semantic similarity search.

2. Sparse Retrieval

Generates sparse vectors (vocabulary-sized with mostly zeros) calculating weights only for tokens present in the text, similar to BM25 but learned.

3. Multi-Vector Retrieval (ColBERT-style)

Uses multiple vectors to represent text, enabling fine-grained similarity matching at the token level.

Key Features

Multilingual: Supports 100+ languages
Multi-Granularity: Handles inputs from short sentences to long documents (up to 8192 tokens)
Hybrid Ranking: Combines multiple retrieval methods for improved accuracy
Self-Knowledge Distillation: Trained using advanced distillation techniques

Recommended Pipeline

The optimal setup is hybrid retrieval + re-ranking:

Use dense or sparse retrieval to get candidate results
Rerank using weighted combination of dense, sparse, and multi-vector scores
Apply BGE reranker for final ranking

Performance

On the MLDR test set (13-language long document retrieval):

Sparse retrieval achieved ~10 NDCG@10 points higher than dense mode
Dense+sparse hybrid provided further gains
Strong performance across diverse benchmarks

Platform Support

Hugging Face Transformers
NVIDIA NIM
DeepInfra
Ollama
Vespa and Milvus for hybrid retrieval

Use Cases

Multilingual semantic search
Hybrid search combining keyword and semantic matching
Long document retrieval
Cross-lingual information retrieval
RAG applications requiring multiple retrieval strategies

Pricing

Free and open-source. Available through various commercial API providers with usage-based pricing.

Surveys

Loading more......

Information

Websitehuggingface.co

PublishedMar 20, 2026

Tags

3 Items

#Embedding Model #Hybrid Search #Multilingual

Similar Products

6 result(s)

EmbeddingGemma

Featured

Google's 308M parameter multilingual text embedding model based on Gemma 3 that runs in less than 200MB RAM with quantization, generates embeddings in under 22ms on EdgeTPU, and ranks highest on MTEB for models under 500M parameters.

Jina-CLIP v2

A 0.9B multimodal embedding model with multilingual support for 89 languages, 512x512 image resolution, and Matryoshka representations that enable dimensional flexibility from 1024 down to 64 dimensions while maintaining strong performance.

BGE-M3

A versatile multilingual text embedding model from BAAI that supports 100+ languages and can handle inputs up to 8192 tokens. BGE-M3 is unique in supporting three retrieval methods simultaneously: dense retrieval, multi-vector retrieval, and sparse retrieval.

pinecone-sparse-english-v0

Featured

Learned sparse embedding model built on DeepImpact architecture, outperforming BM25 by up to 44% on TREC benchmarks for high-precision keyword search and hybrid retrieval.

voyage-3-large

Featured

State-of-the-art general-purpose and multilingual embedding model from Voyage AI that ranks first across eight domains spanning 100 datasets, outperforming OpenAI and Cohere models by significant margins.

Qwen3 Embedding

Featured

Multilingual embedding model supporting over 100 languages and ranking #1 on MTEB multilingual leaderboard. Offers flexible model sizes from 0.6B to 8B parameters with user-defined instructions.

BGE-M3

🌐Visit Website

About this tool

Overview

Three Retrieval Methods

1. Dense Retrieval

Uses the normalized hidden state of the [CLS] token as the dense embedding for semantic similarity search.

2. Sparse Retrieval

Generates sparse vectors (vocabulary-sized with mostly zeros) calculating weights only for tokens present in the text, similar to BM25 but learned.

3. Multi-Vector Retrieval (ColBERT-style)

Uses multiple vectors to represent text, enabling fine-grained similarity matching at the token level.

Key Features

Multilingual: Supports 100+ languages
Multi-Granularity: Handles inputs from short sentences to long documents (up to 8192 tokens)
Hybrid Ranking: Combines multiple retrieval methods for improved accuracy
Self-Knowledge Distillation: Trained using advanced distillation techniques

Recommended Pipeline

The optimal setup is hybrid retrieval + re-ranking:

Use dense or sparse retrieval to get candidate results
Rerank using weighted combination of dense, sparse, and multi-vector scores
Apply BGE reranker for final ranking

Performance

On the MLDR test set (13-language long document retrieval):

Sparse retrieval achieved ~10 NDCG@10 points higher than dense mode
Dense+sparse hybrid provided further gains
Strong performance across diverse benchmarks

Platform Support

Hugging Face Transformers
NVIDIA NIM
DeepInfra
Ollama
Vespa and Milvus for hybrid retrieval

Use Cases

Multilingual semantic search
Hybrid search combining keyword and semantic matching
Long document retrieval
Cross-lingual information retrieval
RAG applications requiring multiple retrieval strategies

Pricing

Free and open-source. Available through various commercial API providers with usage-based pricing.

Surveys

Loading more......

Information

Websitehuggingface.co

PublishedMar 20, 2026

BGE-M3

About this tool

Overview

Three Retrieval Methods

1. Dense Retrieval

2. Sparse Retrieval

3. Multi-Vector Retrieval (ColBERT-style)

Key Features

Recommended Pipeline

Performance

Platform Support

Use Cases

Pricing

Information

Categories

Tags

Similar Products

BGE-M3

About this tool

Overview

Three Retrieval Methods

1. Dense Retrieval

2. Sparse Retrieval

3. Multi-Vector Retrieval (ColBERT-style)

Key Features

Recommended Pipeline

Performance

Platform Support

Use Cases

Pricing

Information

Categories

Tags

Similar Products