• Home
  • Categories
  • Tags
  • Pricing
  • Submit
    Decorative pattern
    1. Home
    2. Sdks & Libraries
    3. FlagEmbedding

    FlagEmbedding

    Open-source retrieval and RAG framework from BAAI featuring the BGE embedding model series. BGE-M3 supports multi-functionality (dense, sparse, multi-vector), multi-linguality (100+ languages), and multi-granularity (up to 8192 tokens).

    🌐Visit Website

    About this tool

    Overview

    FlagEmbedding is a comprehensive retrieval and retrieval-augmented LLM framework developed by the Beijing Academy of Artificial Intelligence (BAAI). It includes the BGE (BAAI General Embedding) series of state-of-the-art embedding models.

    BGE Model Series

    BGE v1.5 (bge-*-v1.5)

    Improved version addressing similarity distribution issues:

    • Enhanced retrieval ability without requiring instructions
    • Available in large, base, and small sizes
    • Top performance on MTEB and C-MTEB benchmarks

    BGE-M3 (Multi-Functionality, Multi-Linguality, Multi-Granularity)

    The flagship model with unique versatility:

    Multi-Functionality

    • Dense Retrieval: Traditional vector similarity search
    • Multi-Vector Retrieval: Multiple vector representations per document
    • Sparse Retrieval: Keyword-based retrieval like BM25
    • Supports all three retrieval modes simultaneously

    Multi-Linguality

    • Supports over 100 languages
    • Trained on balanced multilingual datasets
    • Strong cross-lingual capabilities

    Multi-Granularity

    • Short Texts: Single sentences
    • Medium Texts: Paragraphs
    • Long Documents: Up to 8,192 tokens
    • Handles various input lengths effectively

    Training Methodology

    • Pre-training: RetroMAE approach
    • Fine-tuning: Large-scale pairs data with contrastive learning
    • Data Quality: Curated high-quality training datasets

    Additional Models

    Reranker Models

    • bge-reranker-base: Cross-encoder for reranking
    • bge-reranker-large: Larger, more powerful reranker
    • bge-reranker-v2-m3: Latest reranker with multilingual support
    • More accurate than embedding-only approaches
    • Recommended for re-ranking top-k retrieved documents

    Performance

    • MTEB Leaderboard: Ranked #1 for English embeddings
    • C-MTEB: Top performance on Chinese benchmark
    • Multilingual Tasks: Strong performance across 100+ languages
    • Retrieval Quality: Superior recall and precision metrics

    Key Features

    • Open Source: Fully open-source under permissive license
    • Easy Integration: Compatible with popular frameworks
    • Fine-tuning Support: Can be adapted to specific domains
    • Production Ready: Battle-tested in real-world applications

    Use Cases

    • Semantic search across languages
    • RAG (Retrieval-Augmented Generation) systems
    • Document retrieval and ranking
    • Clustering and classification
    • Cross-lingual information retrieval
    • Question answering systems
    • Recommendation engines

    Integration

    Framework Support

    • LangChain
    • LlamaIndex
    • Haystack
    • HuggingFace Transformers
    • Sentence Transformers

    Deployment Options

    • Local inference
    • Cloud APIs
    • Together AI platform
    • Amazon Bedrock (fine-tuning support)

    Model Sizes

    • Large: Maximum accuracy, higher compute
    • Base: Balanced performance and efficiency
    • Small: Lightweight, faster inference

    Technical Details

    • Architecture: Transformer-based encoders
    • Context Window: Up to 8,192 tokens (BGE-M3)
    • Embedding Dimensions: Model-dependent (typically 768-1024)
    • Batch Processing: Optimized for throughput

    Pricing

    Free and open-source. Available on HuggingFace Hub for self-hosting or through commercial API providers.

    Surveys

    Loading more......

    Information

    Websitegithub.com
    PublishedMar 11, 2026

    Categories

    1 Item
    Sdks & Libraries

    Tags

    3 Items
    #Open Source#Embeddings#Multilingual

    Similar Products

    6 result(s)
    jina-embeddings-v5

    Jina AI's latest embedding model achieving the highest multilingual performance among models under 1B parameters with 71.7 average MTEB score and 67.7 MMTEB score.

    Nomic Embed Text v2

    Open-source multilingual embedding model using Mixture-of-Experts architecture, achieving excellent semantic performance with efficient inference and full offline support.

    GTE Embeddings

    General Text Embeddings from Alibaba DAMO Academy trained on large-scale relevance pairs. Available in three sizes (large, base, small) with GTE-v1.5 supporting 8192 context length.

    Nomic Embed Text
    Featured

    First fully reproducible open-source text embedding model with 8,192 context length. v2 introduces Mixture-of-Experts architecture for multilingual embeddings. Outperforms OpenAI models on benchmarks. This is an OSS model under Apache 2.0 license.

    jina-embeddings-v3

    Frontier multilingual text embedding model with 570M parameters and 8192 token-length, featuring task-specific LoRA adapters and outperforming OpenAI and Cohere embeddings on MTEB benchmark.

    BGE-reranker-v2-m3

    Open-source multilingual reranking model from BAAI supporting 100+ languages with Apache 2.0 licensing, matching Cohere's latency on GPU with zero ongoing costs for production deployments.

    Decorative pattern
    Built with
    Ever Works
    Ever Works

    Connect with us

    Stay Updated

    Get the latest updates and exclusive content delivered to your inbox.

    Product

    • Categories
    • Tags
    • Pricing
    • Help

    Clients

    • Sign In
    • Register
    • Forgot password?

    Company

    • About Us
    • Admin
    • Sitemap

    Resources

    • Blog
    • Submit
    • API Documentation
    All product names, logos, and brands are the property of their respective owners. All company, product, and service names used in this repository, related repositories, and associated websites are for identification purposes only. The use of these names, logos, and brands does not imply endorsement, affiliation, or sponsorship. This directory may include content generated by artificial intelligence.
    Copyright © 2025 Awesome Vector Databases. All rights reserved.·Terms of Service·Privacy Policy·Cookies