• Home
  • Categories
  • Tags
  • Pricing
  • Submit
    Decorative pattern
    1. Home
    2. Machine Learning Models
    3. multilingual-e5-large

    multilingual-e5-large

    Microsoft's state-of-the-art multilingual text embedding model supporting 100 languages with 1024-dimensional embeddings, trained on 1 billion multilingual text pairs for robust cross-lingual retrieval.

    🌐Visit Website

    About this tool

    Overview

    The multilingual-e5-large model is a sophisticated embedding model developed at Microsoft, supporting 100 languages from xlm-roberta. It's designed for robust text representation across diverse languages and tasks.

    Model Specifications

    • Architecture: 24 layers based on XLM-RoBERTa-large
    • Embedding Size: 1024 dimensions
    • 560M Parameters: Optimal balance of performance and efficiency
    • Multilingual Support: 100 languages

    Training Methodology

    The training procedure adheres to the English E5 model recipe:

    1. Contrastive pre-training on 1 billion multilingual text pairs
    2. Fine-tuning on a combination of labeled datasets

    Performance

    • Achieves 51.4 on BEIR benchmark
    • Strong cross-lingual retrieval capabilities
    • Robust performance across various text representation tasks

    Model Family

    The E5 family includes:

    • multilingual-e5-small: 12 layers, 384 dimensions
    • multilingual-e5-base: 12 layers, 768 dimensions
    • multilingual-e5-large: 24 layers, 1024 dimensions (this model)
    • multilingual-e5-large-instruct: Instruction-tuned version with 52.5 BEIR score

    Use Cases

    • Information retrieval
    • Semantic textual similarity
    • Text reranking
    • Cross-lingual search
    • Document classification
    • Clustering

    Resources

    • Hugging Face: intfloat/multilingual-e5-large
    • GitHub: microsoft/unilm/e5
    • Technical Report: arXiv:2402.05672

    Pricing

    Free and open-source model available on Hugging Face.

    Surveys

    Loading more......

    Information

    Websitehuggingface.co
    PublishedMar 14, 2026

    Categories

    1 Item
    Machine Learning Models

    Tags

    3 Items
    #Multilingual#Embedding#Microsoft

    Similar Products

    6 result(s)
    Nomic Embed Text
    Featured

    First fully reproducible open-source text embedding model with 8,192 context length. v2 introduces Mixture-of-Experts architecture for multilingual embeddings. Outperforms OpenAI models on benchmarks. This is an OSS model under Apache 2.0 license.

    jina-embeddings-v3

    Frontier multilingual text embedding model with 570M parameters and 8192 token-length, featuring task-specific LoRA adapters and outperforming OpenAI and Cohere embeddings on MTEB benchmark.

    Jina ColBERT v2

    Groundbreaking multilingual information retrieval model supporting 89 languages with token-level embeddings and late interaction. Features Matryoshka embeddings for flexible efficiency-precision tradeoffs and 8192 token input context.

    E5 Embeddings

    Open-source text embedding models from Microsoft supporting 100+ languages. Features small, base, and large variants with weakly-supervised contrastive pre-training. This is an OSS model family released by Microsoft Research.

    voyage-3-large
    Featured

    State-of-the-art general-purpose and multilingual embedding model from Voyage AI that ranks first across eight domains spanning 100 datasets, outperforming OpenAI and Cohere models by significant margins.

    Qwen3 Embedding
    Featured

    Multilingual embedding model supporting over 100 languages and ranking #1 on MTEB multilingual leaderboard. Offers flexible model sizes from 0.6B to 8B parameters with user-defined instructions.

    Decorative pattern
    Built with
    Ever Works
    Ever Works

    Connect with us

    Stay Updated

    Get the latest updates and exclusive content delivered to your inbox.

    Product

    • Categories
    • Tags
    • Pricing
    • Help

    Clients

    • Sign In
    • Register
    • Forgot password?

    Company

    • About Us
    • Admin
    • Sitemap

    Resources

    • Blog
    • Submit
    • API Documentation
    All product names, logos, and brands are the property of their respective owners. All company, product, and service names used in this repository, related repositories, and associated websites are for identification purposes only. The use of these names, logos, and brands does not imply endorsement, affiliation, or sponsorship. This directory may include content generated by artificial intelligence.
    Copyright © 2025 Awesome Vector Databases. All rights reserved.·Terms of Service·Privacy Policy·Cookies