• Home
  • Categories
  • Tags
  • Pricing
  • Submit
    Built with
    Ever Works
    Ever Works

    Connect with us

    Stay Updated

    Get the latest updates and exclusive content delivered to your inbox.

    Product

    • Categories
    • Tags
    • Pricing
    • Help

    Clients

    • Sign In
    • Register
    • Forgot password?

    Company

    • About Us
    • Admin
    • Sitemap

    Resources

    • Blog
    • Submit
    • API Documentation
    All product names, logos, and brands are the property of their respective owners. All company, product, and service names used in this repository, related repositories, and associated websites are for identification purposes only. The use of these names, logos, and brands does not imply endorsement, affiliation, or sponsorship. This directory may include content generated by artificial intelligence.
    Copyright © 2025 Awesome Vector Databases. All rights reserved.·Terms of Service·Privacy Policy·Cookies
    Decorative pattern
    Decorative pattern
    1. Home
    2. Machine Learning Models
    3. GTE Embeddings

    GTE Embeddings

    General Text Embeddings from Alibaba DAMO Academy trained on large-scale relevance pairs. Available in three sizes (large, base, small) with GTE-v1.5 supporting 8192 context length.

    🌐Visit Website

    About this tool

    Surveys

    Loading more......

    Information

    Websitehuggingface.co
    PublishedMar 8, 2026

    Categories

    1 Item
    Machine Learning Models

    Tags

    3 Items
    #Embeddings#Open Source#Multilingual

    Similar Products

    6 result(s)
    Qwen3 Embedding
    Featured

    Multilingual embedding model supporting over 100 languages and ranking #1 on MTEB multilingual leaderboard. Offers flexible model sizes from 0.6B to 8B parameters with user-defined instructions.

    Nomic Embed Text
    Featured

    First fully reproducible open-source text embedding model with 8,192 context length. v2 introduces Mixture-of-Experts architecture for multilingual embeddings. Outperforms OpenAI models on benchmarks. This is an OSS model under Apache 2.0 license.

    ModernBERT Embed

    Open-source embedding model from Nomic AI based on ModernBERT-base with 149M parameters. Supports 8192 token sequences and Matryoshka Representation Learning for 3x memory reduction.

    E5 Embeddings

    Open-source text embedding models from Microsoft supporting 100+ languages. Features small, base, and large variants with weakly-supervised contrastive pre-training. This is an OSS model family released by Microsoft Research.

    BGE-VL
    Featured

    State-of-the-art multimodal embedding model from BAAI supporting text-to-image, image-to-text, and compositional visual search. Trained on the MegaPairs dataset with over 26 million retrieval triplets.

    ColBERT
    Featured

    Late interaction architecture for efficient and effective passage search. Encodes queries and documents independently using BERT, then performs token-level similarity via maxsim operator for strong generalization.

    Overview

    The GTE (General Text Embeddings) models are trained by Alibaba DAMO Academy and are mainly based on the BERT framework. They are trained on a large-scale corpus of relevance text pairs, covering a wide range of domains and scenarios.

    Model Sizes

    GTE offers three different sizes to balance performance and efficiency:

    • GTE-large: Highest performance
    • GTE-base: Balanced performance and size
    • GTE-small: Optimized for efficiency (MTEB score: 61.36)

    Benchmark Performance

    GTE models were compared with other popular text embedding models on the MTEB benchmark:

    • Detailed comparison results available on MTEB leaderboard
    • GTE-small achieves comprehensive score of 61.36 on MTEB
    • Competitive performance across various embedding tasks

    Recent Developments

    GTE-v1.5 Series

    Upgraded GTE embeddings with:

    • Support for context length up to 8192 tokens
    • Enhanced model performance
    • Built upon transformer++ encoder backbone (BERT + RoPE + GLU)

    GTE-Multilingual (mGTE) Series

    Introduced by Alibaba's Tongyi Lab featuring:

    • High performance across languages
    • Long-context handling
    • Multilingual support
    • Elastic embedding capabilities
    • Significantly improved retrieval and ranking efficiency
    • Outstanding results across datasets

    Applications

    GTE models enable various downstream tasks:

    • Information retrieval
    • Semantic textual similarity
    • Text reranking
    • RAG (Retrieval-Augmented Generation) systems
    • Cross-lingual search

    Technical Details

    • Based on BERT framework
    • Trained on diverse relevance text pairs
    • Covers wide range of domains and scenarios
    • Supports both English and multilingual variants

    Availability

    • Hugging Face Model Hub
    • DeepInfra deployment platform
    • Various cloud inference services
    • Open-source with permissive licensing

    Evolution Path

    1. Original GTE: BERT-based, standard context
    2. GTE-v1.5: Extended context (8192), transformer++ backbone
    3. GTE-Multilingual: Multilingual support, elastic embeddings
    4. GTE-Qwen: Next-generation models based on Qwen foundation

    Comparison with Competitors

    GTE models provide strong performance while maintaining efficiency, making them suitable for production deployments where both quality and resource constraints matter.