MMTEB

Massive Multilingual Text Embedding Benchmark covering over 500 quality-controlled evaluation tasks across 250+ languages, representing the largest multilingual collection of embedding model evaluation tasks.

Visit Website

Overview

MMTEB (Massive Multilingual Text Embedding Benchmark) is a large-scale, community-driven expansion of MTEB, covering over 500 quality-controlled evaluation tasks across 250+ languages. It represents the largest multilingual collection of evaluation tasks for embedding models to date.

Key Features

Diverse Task Set

Includes a diverse set of challenging, novel tasks:

Instruction following
Long-document retrieval
Code retrieval
Traditional NLP tasks (classification, clustering, etc.)

Community-Driven

Created through a large-scale, open collaboration, with contributors including:

Native speakers from diverse linguistic backgrounds
NLP practitioners
Academic and industry researchers
Enthusiasts

Regional Benchmarks

From the extensive collection of tasks in MMTEB, several representative benchmarks were developed:

MTEB(Multilingual): Highly multilingual benchmark
MTEB(Europe): Regional geopolitical benchmark for European languages
MTEB(Indic): Regional geopolitical benchmark for Indic languages

Performance Findings

While large language models (LLMs) with billions of parameters can achieve state-of-the-art performance on certain language subsets and task categories, the best-performing publicly available model is multilingual-e5-large-instruct with only 560 million parameters.

Computational Efficiency

Introduces a novel downsampling method based on inter-task correlation, ensuring a diverse selection while preserving relative model rankings at a fraction of the computational cost.

Pricing

Free to use - open benchmark published February 2025.

Surveys

Loading more......

Information

Websitearxiv.org

PublishedMar 13, 2026

Tags

3 Items

#benchmark #multilingual #evaluation

Similar Products

MTEB Leaderboard

Massive Text Embedding Benchmark leaderboard covering 58 datasets across 112 languages and 8 embedding tasks. Industry-standard benchmark for comparing text embedding models.

000

BEIR

BEIR (Benchmarking IR) is a benchmark suite for evaluating information retrieval and vector search systems across multiple tasks and datasets. Useful for comparing vector database performance.

000

MTEB

Massive Text Embedding Benchmark (MTEB) - a comprehensive benchmark for evaluating text embedding models across 8 embedding tasks and 58 datasets in 112 languages. Provides a standardized leaderboard for comparing embedding quality across classification, clustering, retrieval, reranking, semantic textual similarity, and summarization tasks.

000

LongMemEval

Comprehensive benchmark for evaluating long-term memory in chat assistants with 500 manual questions testing information extraction, multi-session reasoning, and temporal reasoning across 115K-1.5M tokens.

000

SISAP Indexing Challenge

An annual competition focused on similarity search and indexing algorithms, including approximate nearest neighbor methods and high-dimensional vector indexing, providing benchmarks and results relevant to vector database research.

000

IntelLabs's Vector Search Datasets

A collection of datasets curated by Intel Labs specifically for evaluating and benchmarking vector search algorithms and databases.

000

Overview

Key Features

Diverse Task Set

Includes a diverse set of challenging, novel tasks:

Instruction following
Long-document retrieval
Code retrieval
Traditional NLP tasks (classification, clustering, etc.)

Community-Driven

Created through a large-scale, open collaboration, with contributors including:

Native speakers from diverse linguistic backgrounds
NLP practitioners
Academic and industry researchers
Enthusiasts

Regional Benchmarks

From the extensive collection of tasks in MMTEB, several representative benchmarks were developed:

MTEB(Multilingual): Highly multilingual benchmark
MTEB(Europe): Regional geopolitical benchmark for European languages
MTEB(Indic): Regional geopolitical benchmark for Indic languages

Performance Findings

Computational Efficiency

Introduces a novel downsampling method based on inter-task correlation, ensuring a diverse selection while preserving relative model rankings at a fraction of the computational cost.

Pricing

Free to use - open benchmark published February 2025.

MMTEB

Overview

Key Features

Diverse Task Set

Community-Driven

Regional Benchmarks

Performance Findings

Computational Efficiency

Pricing

Information

Categories

Tags

Similar Products

Connect with us

Stay Updated

Product

Clients

Company

Resources

MMTEB

Overview

Key Features

Diverse Task Set

Community-Driven

Regional Benchmarks

Performance Findings

Computational Efficiency

Pricing

Information

Categories

Tags

Similar Products