



A versatile multilingual text embedding model from BAAI that supports 100+ languages and can handle inputs up to 8192 tokens. BGE-M3 is unique in supporting three retrieval methods simultaneously: dense retrieval, multi-vector retrieval, and sparse retrieval.
Loading more......
BGE-M3 (BGE-Multi) is a text embedding model developed by BAAI (Beijing Academy of Artificial Intelligence) that stands out for its three "Multi" capabilities: multi-functionality, multi-linguality, and multi-granularity.
BGE-M3 is the first embedding model to simultaneously support all three common retrieval functionalities:
This unique capability eliminates the need for multiple separate models.
Trained on datasets covering 170+ different languages, BGE-M3 can work with over 100 languages in production. It achieves state-of-the-art performance on:
Performance surpasses models from OpenAI in both English and other languages.
Processes inputs of varying lengths:
This flexibility makes it suitable for diverse use cases from FAQ search to full document retrieval.
Base Model: XLM-RoBERTa
Output Dimensions: 1024-dimensional embeddings as primary output
Training: Trained on massive multilingual corpora with contrastive learning objectives for all three retrieval modes
The BGE-M3 developers recommend: Hybrid Retrieval + Re-ranking
This combination:
Open-source and available through:
Free and open-source under permissive licenses, making it cost-effective for commercial deployments compared to proprietary multilingual embedding APIs.