GTE Embeddings

General Text Embeddings from Alibaba DAMO Academy trained on large-scale relevance pairs. Available in three sizes (large, base, small) with GTE-v1.5 supporting 8192 context length.

🌐Visit Website

About this tool

Overview

The GTE (General Text Embeddings) models are trained by Alibaba DAMO Academy and are mainly based on the BERT framework. They are trained on a large-scale corpus of relevance text pairs, covering a wide range of domains and scenarios.

Model Sizes

GTE offers three different sizes to balance performance and efficiency:

GTE-large: Highest performance
GTE-base: Balanced performance and size
GTE-small: Optimized for efficiency (MTEB score: 61.36)

Benchmark Performance

GTE models were compared with other popular text embedding models on the MTEB benchmark:

Detailed comparison results available on MTEB leaderboard
GTE-small achieves comprehensive score of 61.36 on MTEB
Competitive performance across various embedding tasks

Recent Developments

GTE-v1.5 Series

Upgraded GTE embeddings with:

Support for context length up to 8192 tokens
Enhanced model performance
Built upon transformer++ encoder backbone (BERT + RoPE + GLU)

GTE-Multilingual (mGTE) Series

Introduced by Alibaba's Tongyi Lab featuring:

High performance across languages
Long-context handling
Multilingual support
Elastic embedding capabilities
Significantly improved retrieval and ranking efficiency
Outstanding results across datasets

Applications

GTE models enable various downstream tasks:

Information retrieval
Semantic textual similarity
Text reranking
RAG (Retrieval-Augmented Generation) systems
Cross-lingual search

Technical Details

Based on BERT framework
Trained on diverse relevance text pairs
Covers wide range of domains and scenarios
Supports both English and multilingual variants

Availability

Hugging Face Model Hub
DeepInfra deployment platform
Various cloud inference services
Open-source with permissive licensing

Evolution Path

Original GTE: BERT-based, standard context
GTE-v1.5: Extended context (8192), transformer++ backbone
GTE-Multilingual: Multilingual support, elastic embeddings
GTE-Qwen: Next-generation models based on Qwen foundation

Comparison with Competitors

GTE models provide strong performance while maintaining efficiency, making them suitable for production deployments where both quality and resource constraints matter.

Surveys

Loading more......

Information

Websitehuggingface.co

PublishedMar 8, 2026

Tags

3 Items

#Embeddings #Open Source #Multilingual

Similar Products

6 result(s)

Qwen3 Embedding

Featured

Multilingual embedding model supporting over 100 languages and ranking #1 on MTEB multilingual leaderboard. Offers flexible model sizes from 0.6B to 8B parameters with user-defined instructions.

Nomic Embed Text

Featured

First fully reproducible open-source text embedding model with 8,192 context length. v2 introduces Mixture-of-Experts architecture for multilingual embeddings. Outperforms OpenAI models on benchmarks. This is an OSS model under Apache 2.0 license.

ModernBERT Embed

Open-source embedding model from Nomic AI based on ModernBERT-base with 149M parameters. Supports 8192 token sequences and Matryoshka Representation Learning for 3x memory reduction.

E5 Embeddings

Open-source text embedding models from Microsoft supporting 100+ languages. Features small, base, and large variants with weakly-supervised contrastive pre-training. This is an OSS model family released by Microsoft Research.

BGE-VL

Featured

State-of-the-art multimodal embedding model from BAAI supporting text-to-image, image-to-text, and compositional visual search. Trained on the MegaPairs dataset with over 26 million retrieval triplets.

ColBERT

Featured

Late interaction architecture for efficient and effective passage search. Encodes queries and documents independently using BERT, then performs token-level similarity via maxsim operator for strong generalization.

GTE Embeddings

General Text Embeddings from Alibaba DAMO Academy trained on large-scale relevance pairs. Available in three sizes (large, base, small) with GTE-v1.5 supporting 8192 context length.

🌐Visit Website

About this tool

Overview

Model Sizes

GTE offers three different sizes to balance performance and efficiency:

GTE-large: Highest performance
GTE-base: Balanced performance and size
GTE-small: Optimized for efficiency (MTEB score: 61.36)

Benchmark Performance

GTE models were compared with other popular text embedding models on the MTEB benchmark:

Detailed comparison results available on MTEB leaderboard
GTE-small achieves comprehensive score of 61.36 on MTEB
Competitive performance across various embedding tasks

Recent Developments

GTE-v1.5 Series

Upgraded GTE embeddings with:

Support for context length up to 8192 tokens
Enhanced model performance
Built upon transformer++ encoder backbone (BERT + RoPE + GLU)

GTE-Multilingual (mGTE) Series

Introduced by Alibaba's Tongyi Lab featuring:

High performance across languages
Long-context handling
Multilingual support
Elastic embedding capabilities
Significantly improved retrieval and ranking efficiency
Outstanding results across datasets

Applications

GTE models enable various downstream tasks:

Information retrieval
Semantic textual similarity
Text reranking
RAG (Retrieval-Augmented Generation) systems
Cross-lingual search

Technical Details

Based on BERT framework
Trained on diverse relevance text pairs
Covers wide range of domains and scenarios
Supports both English and multilingual variants

Availability

Hugging Face Model Hub
DeepInfra deployment platform
Various cloud inference services
Open-source with permissive licensing

Evolution Path

Original GTE: BERT-based, standard context
GTE-v1.5: Extended context (8192), transformer++ backbone
GTE-Multilingual: Multilingual support, elastic embeddings
GTE-Qwen: Next-generation models based on Qwen foundation

Comparison with Competitors

GTE models provide strong performance while maintaining efficiency, making them suitable for production deployments where both quality and resource constraints matter.

Surveys

Loading more......

Information

Websitehuggingface.co

PublishedMar 8, 2026

GTE Embeddings

About this tool

Overview

Model Sizes

Benchmark Performance

Recent Developments

GTE-v1.5 Series

GTE-Multilingual (mGTE) Series

Applications

Technical Details

Availability

Evolution Path

Comparison with Competitors

Information

Categories

Tags

Similar Products

GTE Embeddings

About this tool

Overview

Model Sizes

Benchmark Performance

Recent Developments

GTE-v1.5 Series

GTE-Multilingual (mGTE) Series

Applications

Technical Details

Availability

Evolution Path

Comparison with Competitors

Information

Categories

Tags

Similar Products