• Home
  • Categories
  • Tags
  • Pricing
  • Submit
    Decorative pattern
    1. Home
    2. Benchmarks & Evaluation
    3. MTEB (Massive Text Embedding Benchmark)

    MTEB (Massive Text Embedding Benchmark)

    Comprehensive benchmark suite for evaluating embedding models across 58 datasets spanning 112 languages and eight task types including retrieval, clustering, and semantic similarity, the standard for comparing embedding quality.

    🌐Visit Website

    About this tool

    Overview

    MTEB (Massive Text Embedding Benchmark) is a comprehensive benchmark suite for evaluating embedding models across diverse NLP tasks like retrieval, classification, clustering, reranking, and semantic similarity.

    Coverage

    • 58 datasets across multiple domains
    • 112 languages for multilingual evaluation
    • 8 task types:
      1. Retrieval
      2. Clustering
      3. Semantic Textual Similarity (STS)
      4. Classification
      5. Reranking
      6. Pair Classification
      7. Bitext Mining
      8. Summarization

    Evaluation Metrics

    Task-Appropriate Metrics

    • Retrieval: nDCG@10, Recall@K
    • Similarity: Spearman correlation
    • Clustering: Normalized Mutual Information (NMI)
    • Classification: Accuracy, F1-score

    Retrieval Process

    1. Embed queries and documents
    2. Rank documents using cosine similarity
    3. Score with nDCG@10

    BEIR Integration

    MTEB directly uses the open source benchmark BEIR in its retrieval part, which contains 15 datasets covering:

    • Scientific papers (SCIDOCS)
    • Question answering (Natural Questions)
    • Fact verification (FEVER)
    • News (TREC NEWS)
    • Biomedical (TREC COVID)
    • Argumentative search

    Leaderboard

    MTEB maintains a public leaderboard showing:

    • Overall scores across all tasks
    • Task-specific performance
    • Multilingual capabilities
    • Model rankings

    Top models include:

    • NVIDIA NV-Embed-v2
    • Cohere embed-v4
    • Jina embeddings-v3
    • OpenAI text-embedding-3-large

    Practical Usage Workflow

    1. Screening: Use MTEB for fast model filtering (e.g., >60 on retrieval)
    2. Robustness Check: Run BEIR to verify domain transfer
    3. Production Validation: Test on labeled production data
    4. A/B Testing: Deploy and measure real-world performance

    Evaluation with MTEB

    from mteb import MTEB
    from sentence_transformers import SentenceTransformer
    
    model = SentenceTransformer("model-name")
    evaluation = MTEB(tasks=["Banking77Classification"])
    results = evaluation.run(model)
    

    Why MTEB Matters

    • Standardized Comparison: Fair comparison across models
    • Comprehensive Coverage: Multiple tasks and languages
    • Community Standard: Industry-wide adoption
    • Reproducibility: Consistent evaluation methodology
    • Model Selection: Data-driven model choice for RAG and search

    Limitations

    • May not reflect specific domain performance
    • Benchmark tasks may not match production use cases
    • Static datasets may become outdated
    • High scores don't guarantee production success

    Resources

    • GitHub: https://github.com/embeddings-benchmark/mteb
    • Hugging Face: https://huggingface.co/mteb
    • Leaderboard: https://huggingface.co/spaces/mteb/leaderboard
    • Documentation: https://embeddings-benchmark.github.io/mteb/

    Integration

    • Sentence Transformers
    • Hugging Face
    • Evaluation frameworks
    • Research papers

    Pricing

    Free and open-source benchmark framework.

    Surveys

    Loading more......

    Information

    Websitegithub.com
    PublishedMar 14, 2026

    Categories

    1 Item
    Benchmarks & Evaluation

    Tags

    3 Items
    #Benchmark#Evaluation#Leaderboard

    Similar Products

    6 result(s)
    MTEB Leaderboard
    Featured

    Massive Text Embedding Benchmark leaderboard covering 58 datasets across 112 languages and 8 embedding tasks. Industry-standard benchmark for comparing text embedding models.

    ViDoRe Benchmark

    Visual Document Retrieval benchmark designed to evaluate embedding models and retrieval systems on visually rich documents containing tables, charts, diagrams, and complex layouts. The standard benchmark for assessing multi-modal document understanding and retrieval performance.

    MMTEB

    Massive Multilingual Text Embedding Benchmark covering over 500 quality-controlled evaluation tasks across 250+ languages, representing the largest multilingual collection of embedding model evaluation tasks.

    SISAP Indexing Challenge

    An annual competition focused on similarity search and indexing algorithms, including approximate nearest neighbor methods and high-dimensional vector indexing, providing benchmarks and results relevant to vector database research.

    BEIR

    BEIR (Benchmarking IR) is a benchmark suite for evaluating information retrieval and vector search systems across multiple tasks and datasets. Useful for comparing vector database performance.

    IntelLabs's Vector Search Datasets

    A collection of datasets curated by Intel Labs specifically for evaluating and benchmarking vector search algorithms and databases.

    Decorative pattern
    Built with
    Ever Works
    Ever Works

    Connect with us

    Stay Updated

    Get the latest updates and exclusive content delivered to your inbox.

    Product

    • Categories
    • Tags
    • Pricing
    • Help

    Clients

    • Sign In
    • Register
    • Forgot password?

    Company

    • About Us
    • Admin
    • Sitemap

    Resources

    • Blog
    • Submit
    • API Documentation
    All product names, logos, and brands are the property of their respective owners. All company, product, and service names used in this repository, related repositories, and associated websites are for identification purposes only. The use of these names, logos, and brands does not imply endorsement, affiliation, or sponsorship. This directory may include content generated by artificial intelligence.
    Copyright © 2025 Awesome Vector Databases. All rights reserved.·Terms of Service·Privacy Policy·Cookies