• Home
  • Categories
  • Tags
  • Pricing
  • Submit
    Decorative pattern
    1. Home
    2. Machine Learning Models
    3. BGE-M3

    BGE-M3

    A versatile embedding model from BAAI that simultaneously supports dense retrieval, sparse retrieval, and multi-vector retrieval, with multilingual support for 100+ languages and multi-granularity processing from short sentences to 8192-token documents.

    🌐Visit Website

    About this tool

    Overview

    BGE-M3 stands for Multi-Functionality, Multi-Linguality, and Multi-Granularity. It is a groundbreaking embedding model that can simultaneously perform three common retrieval functionalities in a single model.

    Three Retrieval Methods

    1. Dense Retrieval

    Uses the normalized hidden state of the [CLS] token as the dense embedding for semantic similarity search.

    2. Sparse Retrieval

    Generates sparse vectors (vocabulary-sized with mostly zeros) calculating weights only for tokens present in the text, similar to BM25 but learned.

    3. Multi-Vector Retrieval (ColBERT-style)

    Uses multiple vectors to represent text, enabling fine-grained similarity matching at the token level.

    Key Features

    • Multilingual: Supports 100+ languages
    • Multi-Granularity: Handles inputs from short sentences to long documents (up to 8192 tokens)
    • Hybrid Ranking: Combines multiple retrieval methods for improved accuracy
    • Self-Knowledge Distillation: Trained using advanced distillation techniques

    Recommended Pipeline

    The optimal setup is hybrid retrieval + re-ranking:

    1. Use dense or sparse retrieval to get candidate results
    2. Rerank using weighted combination of dense, sparse, and multi-vector scores
    3. Apply BGE reranker for final ranking

    Performance

    On the MLDR test set (13-language long document retrieval):

    • Sparse retrieval achieved ~10 NDCG@10 points higher than dense mode
    • Dense+sparse hybrid provided further gains
    • Strong performance across diverse benchmarks

    Platform Support

    • Hugging Face Transformers
    • NVIDIA NIM
    • DeepInfra
    • Ollama
    • Vespa and Milvus for hybrid retrieval

    Use Cases

    • Multilingual semantic search
    • Hybrid search combining keyword and semantic matching
    • Long document retrieval
    • Cross-lingual information retrieval
    • RAG applications requiring multiple retrieval strategies

    Pricing

    Free and open-source. Available through various commercial API providers with usage-based pricing.

    Surveys

    Loading more......

    Information

    Websitehuggingface.co
    PublishedMar 20, 2026

    Categories

    1 Item
    Machine Learning Models

    Tags

    3 Items
    #Embedding Model#Hybrid Search#Multilingual

    Similar Products

    6 result(s)
    EmbeddingGemma
    Featured

    Google's 308M parameter multilingual text embedding model based on Gemma 3 that runs in less than 200MB RAM with quantization, generates embeddings in under 22ms on EdgeTPU, and ranks highest on MTEB for models under 500M parameters.

    Jina-CLIP v2

    A 0.9B multimodal embedding model with multilingual support for 89 languages, 512x512 image resolution, and Matryoshka representations that enable dimensional flexibility from 1024 down to 64 dimensions while maintaining strong performance.

    BGE-M3

    A versatile multilingual text embedding model from BAAI that supports 100+ languages and can handle inputs up to 8192 tokens. BGE-M3 is unique in supporting three retrieval methods simultaneously: dense retrieval, multi-vector retrieval, and sparse retrieval.

    pinecone-sparse-english-v0
    Featured

    Learned sparse embedding model built on DeepImpact architecture, outperforming BM25 by up to 44% on TREC benchmarks for high-precision keyword search and hybrid retrieval.

    voyage-3-large
    Featured

    State-of-the-art general-purpose and multilingual embedding model from Voyage AI that ranks first across eight domains spanning 100 datasets, outperforming OpenAI and Cohere models by significant margins.

    Qwen3 Embedding
    Featured

    Multilingual embedding model supporting over 100 languages and ranking #1 on MTEB multilingual leaderboard. Offers flexible model sizes from 0.6B to 8B parameters with user-defined instructions.

    Decorative pattern
    Built with
    Ever Works
    Ever Works

    Connect with us

    Stay Updated

    Get the latest updates and exclusive content delivered to your inbox.

    Product

    • Categories
    • Tags
    • Pricing
    • Help

    Clients

    • Sign In
    • Register
    • Forgot password?

    Company

    • About Us
    • Admin
    • Sitemap

    Resources

    • Blog
    • Submit
    • API Documentation
    All product names, logos, and brands are the property of their respective owners. All company, product, and service names used in this repository, related repositories, and associated websites are for identification purposes only. The use of these names, logos, and brands does not imply endorsement, affiliation, or sponsorship. This directory may include content generated by artificial intelligence.
    Copyright © 2025 Awesome Vector Databases. All rights reserved.·Terms of Service·Privacy Policy·Cookies