• Home
  • Categories
  • Pricing
  • Submit
    Built with
    Ever Works
    Ever Works

    Connect with us

    Stay Updated

    Get the latest updates and exclusive content delivered to your inbox.

    Product

    • Categories
    • Pricing
    • Help

    Clients

    • Sign In
    • Register
    • Forgot password?

    Company

    • About Us
    • Admin
    • Sitemap

    Resources

    • Blog
    • Submit
    • API Documentation
    All product names, logos, and brands are the property of their respective owners. All company, product, and service names used in this repository, related repositories, and associated websites are for identification purposes only. The use of these names, logos, and brands does not imply endorsement, affiliation, or sponsorship. This directory may include content generated by artificial intelligence.
    Copyright © 2025 Awesome Vector Databases. All rights reserved.·Terms of Service·Privacy Policy·Cookies
    Decorative pattern
    Decorative pattern
    1. Home
    2. Benchmarks & Evaluation
    3. MTEB

    MTEB

    Massive Text Embedding Benchmark (MTEB) - a comprehensive benchmark for evaluating text embedding models across 8 embedding tasks and 58 datasets in 112 languages. Provides a standardized leaderboard for comparing embedding quality across classification, clustering, retrieval, reranking, semantic textual similarity, and summarization tasks.

    Surveys

    Loading more......

    Information

    Websitehuggingface.co
    PublishedMar 22, 2026

    Categories

    1 Item
    Benchmarks & Evaluation

    Tags

    3 Items
    #benchmark#embeddings#multilingual

    Similar Products

    6 result(s)

    MTEB Leaderboard

    Massive Text Embedding Benchmark leaderboard covering 58 datasets across 112 languages and 8 embedding tasks. Industry-standard benchmark for comparing text embedding models.

    Featured

    MMTEB

    Massive Multilingual Text Embedding Benchmark covering over 500 quality-controlled evaluation tasks across 250+ languages, representing the largest multilingual collection of embedding model evaluation tasks.

    Qwen3 Embedding

    Multilingual embedding model supporting over 100 languages and ranking #1 on MTEB multilingual leaderboard. Offers flexible model sizes from 0.6B to 8B parameters with user-defined instructions.

    Featured

    Cohere Embed Multilingual v3

    High-performance multilingual embedding model from Cohere supporting 100+ languages with 1024 dimensions, optimized for semantic search, RAG, and cross-lingual retrieval tasks.

    Mistral Embed

    State-of-the-art embedding model from Mistral AI that generates 1024-dimensional vectors for text, supporting semantic search, clustering, and retrieval-augmented generation applications.

    BGE-M3

    A versatile multilingual text embedding model from BAAI that supports 100+ languages and can handle inputs up to 8192 tokens. BGE-M3 is unique in supporting three retrieval methods simultaneously: dense retrieval, multi-vector retrieval, and sparse retrieval.

    Overview

    MTEB (Massive Text Embedding Benchmark) is the most comprehensive benchmark for evaluating text embedding models. It covers 8 different embedding tasks across 58 datasets in 112 languages, providing a standardized way to compare embedding quality.

    Evaluated Tasks

    MTEB evaluates models across 8 core tasks:

    1. Classification: Text categorization and labeling
    2. Clustering: Grouping similar texts
    3. Pair Classification: Determining if text pairs are related
    4. Reranking: Ordering candidates by relevance
    5. Retrieval: Finding relevant documents for queries
    6. Semantic Textual Similarity (STS): Measuring text similarity
    7. Summarization: Evaluating summary quality
    8. BitextMining: Cross-lingual sentence alignment

    Key Features

    • Comprehensive Coverage: 58 datasets across diverse domains
    • Multilingual: Supports 112 languages
    • Standardized Evaluation: Consistent metrics across all models
    • Public Leaderboard: Compare models transparently
    • Easy Integration: Simple Python API

    Top Models (as of 2026)

    Historically strong performers include:

    • Voyage embeddings (voyage-3, voyage-4)
    • Jina embeddings (jina-embeddings-v3, v4)
    • Cohere embed-v3 and v4
    • BGE-M3 (multilingual)
    • E5 models (Microsoft)
    • GTE models (Alibaba)

    Metrics

    Different metrics for each task:

    • Classification/Clustering: Accuracy, F1
    • Retrieval: NDCG@10, MAP, Recall
    • STS: Spearman correlation
    • Reranking: MAP, MRR

    Why MTEB Matters

    • Industry Standard: De facto benchmark for embedding models
    • Comprehensive: Tests multiple capabilities, not just retrieval
    • Practical: Tasks reflect real-world applications
    • Transparent: Public leaderboard with reproducible results
    • Updated: Regular addition of new models and datasets

    Using MTEB

    from mteb import MTEB
    
    # Evaluate a model
    evaluation = MTEB(tasks=["Banking77Classification"])
    results = evaluation.run(model)
    

    Limitations

    • Bias toward certain types of tasks
    • Some datasets may be in training data
    • Static evaluation (no dynamic updates)
  • Resource intensive to run full benchmark
  • Leaderboard

    Official leaderboard maintained on Hugging Face: https://huggingface.co/spaces/mteb/leaderboard

    Resources

    • GitHub: github.com/embeddings-benchmark/mteb
    • Paper: Available on arXiv
    • Documentation: Comprehensive guides

    Use Cases

    • Model Selection: Choose best embedding for your use case
    • Model Development: Benchmark new architectures
    • Research: Compare against state-of-the-art
    • Production: Validate model performance

    Pricing

    Free and open-source benchmark.