• Home
  • Categories
  • Tags
  • Pricing
  • Submit
    Decorative pattern
    1. Home
    2. Machine Learning Models
    3. stella_en

    stella_en

    A family of English text embedding models distilled from state-of-the-art embedding models using a novel multi-stage distillation framework. Stella models support multiple dimensions (512 to 8192) through Matryoshka Representation Learning, offering flexible embedding sizes for different use cases.

    🌐Visit Website

    About this tool

    Overview

    The stella_en model family represents a breakthrough in embedding model distillation, created by researcher dunzhang. These models are distilled from Alibaba's state-of-the-art GTE embedding models using an innovative multi-stage distillation framework.

    Key Innovation: Multi-Stage Distillation

    Introduced in the paper "Jasper and Stella: distillation of SOTA embedding models" (arXiv:2412.19048), the approach enables a smaller student embedding model to distill multiple larger teacher embedding models through three carefully designed losses.

    Teacher Models

    Stella models are distilled from:

    • Alibaba-NLP/gte-large-en-v1.5
    • Alibaba-NLP/gte-Qwen2-1.5B-instruct

    This multi-teacher approach allows the student model to learn diverse strengths from different architectures.

    Matryoshka Representation Learning (MRL)

    Utilizes MRL to support multiple embedding dimensions:

    • 512 dimensions: Compact, fast, lower storage
    • 768, 1024 dimensions: Balanced performance and efficiency
    • 2048, 4096 dimensions: Higher quality for demanding tasks
    • 6144, 8192 dimensions: Maximum quality

    Performance Note: The MTEB score at 1024d is only 0.001 lower than 8192d, making 1024d a sweet spot for most applications.

    Model Variants

    stella_en_1.5B_v5: 1.5 billion parameters, higher quality

    stella_en_400M_v5: 400 million parameters, smaller and faster

    Both variants support the full range of dimensions through MRL.

    Simplified Prompting

    Stella models simplify prompt usage by providing two prompts for most general tasks:

    • s2p (sentence-to-passage): For query-document retrieval
    • s2s (sentence-to-sentence): For similarity comparison

    This reduces complexity compared to models requiring extensive prompt engineering.

    Performance Benefits

    Competitive Quality: Through distillation, achieves performance close to much larger teacher models

    Flexible Sizing: MRL allows trading off quality vs. speed/storage based on application needs

    Efficiency: Smaller models (400M) offer fast inference while maintaining good quality

    Use Cases

    • High-throughput applications: Use 512 or 768 dimensions for speed
    • Balanced deployments: Use 1024 dimensions for optimal quality/efficiency
    • Quality-critical tasks: Use 4096 or 8192 dimensions
    • Resource-constrained environments: stella_en_400M_v5 with lower dimensions

    Technical Details

    The distillation framework addresses key challenges:

    • Efficiently transferring knowledge from multiple large teachers
    • Maintaining performance across different embedding dimensions
    • Balancing model size reduction with quality preservation

    Availability

    Open-source models available on Hugging Face:

    • dunzhang/stella_en_1.5B_v5
    • dunzhang/stella_en_400M_v5
    • Also available through Marqo and other platforms

    Research

    Based on "Jasper and Stella: distillation of SOTA embedding models" by Dun Zhang, Jiacheng Li, Ziyang Zeng, and Fulong Wang (2025).

    Surveys

    Loading more......

    Information

    Websitehuggingface.co
    PublishedMar 20, 2026

    Categories

    1 Item
    Machine Learning Models

    Tags

    4 Items
    #Embeddings#Matryoshka#distillation#Open Source

    Similar Products

    6 result(s)
    mxbai-embed-large

    State-of-the-art large embedding model from Mixedbread AI, ranked first among similar-sized models, supporting Matryoshka Representation Learning and binary quantization with 700M+ training pairs.

    Qwen3 Embedding
    Featured

    Multilingual embedding model supporting over 100 languages and ranking #1 on MTEB multilingual leaderboard. Offers flexible model sizes from 0.6B to 8B parameters with user-defined instructions.

    BGE-M3

    A versatile multilingual text embedding model from BAAI that supports 100+ languages and can handle inputs up to 8192 tokens. BGE-M3 is unique in supporting three retrieval methods simultaneously: dense retrieval, multi-vector retrieval, and sparse retrieval.

    gte-Qwen2-1.5B-instruct

    A state-of-the-art multilingual text embedding model from Alibaba's GTE (General Text Embedding) series, built on the Qwen2-1.5B LLM. The model supports up to 8192 tokens and incorporates bidirectional attention mechanisms for enhanced contextual understanding across diverse domains.

    INSTRUCTOR

    A task-specific text embedding model that generates customized embeddings based on natural language instructions. INSTRUCTOR achieves state-of-the-art performance on 70 diverse embedding tasks by allowing users to specify the task objective and domain.

    Snowflake Arctic Embed

    Suite of high-quality multilingual text embedding models optimized for retrieval performance, developed by Snowflake and available as open-source for commercial use.

    Decorative pattern
    Built with
    Ever Works
    Ever Works

    Connect with us

    Stay Updated

    Get the latest updates and exclusive content delivered to your inbox.

    Product

    • Categories
    • Tags
    • Pricing
    • Help

    Clients

    • Sign In
    • Register
    • Forgot password?

    Company

    • About Us
    • Admin
    • Sitemap

    Resources

    • Blog
    • Submit
    • API Documentation
    All product names, logos, and brands are the property of their respective owners. All company, product, and service names used in this repository, related repositories, and associated websites are for identification purposes only. The use of these names, logos, and brands does not imply endorsement, affiliation, or sponsorship. This directory may include content generated by artificial intelligence.
    Copyright © 2025 Awesome Vector Databases. All rights reserved.·Terms of Service·Privacy Policy·Cookies