
multilingual-e5-large
Microsoft's state-of-the-art multilingual text embedding model supporting 100 languages with 1024-dimensional embeddings, trained on 1 billion multilingual text pairs for robust cross-lingual retrieval.
About this tool
Overview
The multilingual-e5-large model is a sophisticated embedding model developed at Microsoft, supporting 100 languages from xlm-roberta. It's designed for robust text representation across diverse languages and tasks.
Model Specifications
- Architecture: 24 layers based on XLM-RoBERTa-large
- Embedding Size: 1024 dimensions
- 560M Parameters: Optimal balance of performance and efficiency
- Multilingual Support: 100 languages
Training Methodology
The training procedure adheres to the English E5 model recipe:
- Contrastive pre-training on 1 billion multilingual text pairs
- Fine-tuning on a combination of labeled datasets
Performance
- Achieves 51.4 on BEIR benchmark
- Strong cross-lingual retrieval capabilities
- Robust performance across various text representation tasks
Model Family
The E5 family includes:
- multilingual-e5-small: 12 layers, 384 dimensions
- multilingual-e5-base: 12 layers, 768 dimensions
- multilingual-e5-large: 24 layers, 1024 dimensions (this model)
- multilingual-e5-large-instruct: Instruction-tuned version with 52.5 BEIR score
Use Cases
- Information retrieval
- Semantic textual similarity
- Text reranking
- Cross-lingual search
- Document classification
- Clustering
Resources
- Hugging Face: intfloat/multilingual-e5-large
- GitHub: microsoft/unilm/e5
- Technical Report: arXiv:2402.05672
Pricing
Free and open-source model available on Hugging Face.
Surveys
Loading more......
Information
Websitehuggingface.co
PublishedMar 14, 2026
Categories
Tags
Similar Products
6 result(s)