



Massive Multilingual Text Embedding Benchmark covering over 500 quality-controlled evaluation tasks across 250+ languages, representing the largest multilingual collection of embedding model evaluation tasks.
MMTEB (Massive Multilingual Text Embedding Benchmark) is a large-scale, community-driven expansion of MTEB, covering over 500 quality-controlled evaluation tasks across 250+ languages. It represents the largest multilingual collection of evaluation tasks for embedding models to date.
Includes a diverse set of challenging, novel tasks:
Created through a large-scale, open collaboration, with contributors including:
From the extensive collection of tasks in MMTEB, several representative benchmarks were developed:
While large language models (LLMs) with billions of parameters can achieve state-of-the-art performance on certain language subsets and task categories, the best-performing publicly available model is multilingual-e5-large-instruct with only 560 million parameters.
Introduces a novel downsampling method based on inter-task correlation, ensuring a diverse selection while preserving relative model rankings at a fraction of the computational cost.
Free to use - open benchmark published February 2025.
Loading more......