



A family of English text embedding models distilled from state-of-the-art embedding models using a novel multi-stage distillation framework. Stella models support multiple dimensions (512 to 8192) through Matryoshka Representation Learning, offering flexible embedding sizes for different use cases.
The stella_en model family represents a breakthrough in embedding model distillation, created by researcher dunzhang. These models are distilled from Alibaba's state-of-the-art GTE embedding models using an innovative multi-stage distillation framework.
Introduced in the paper "Jasper and Stella: distillation of SOTA embedding models" (arXiv:2412.19048), the approach enables a smaller student embedding model to distill multiple larger teacher embedding models through three carefully designed losses.
Stella models are distilled from:
This multi-teacher approach allows the student model to learn diverse strengths from different architectures.
Utilizes MRL to support multiple embedding dimensions:
Performance Note: The MTEB score at 1024d is only 0.001 lower than 8192d, making 1024d a sweet spot for most applications.
stella_en_1.5B_v5: 1.5 billion parameters, higher quality
stella_en_400M_v5: 400 million parameters, smaller and faster
Both variants support the full range of dimensions through MRL.
Stella models simplify prompt usage by providing two prompts for most general tasks:
This reduces complexity compared to models requiring extensive prompt engineering.
Competitive Quality: Through distillation, achieves performance close to much larger teacher models
Flexible Sizing: MRL allows trading off quality vs. speed/storage based on application needs
Efficiency: Smaller models (400M) offer fast inference while maintaining good quality
Loading more......
The distillation framework addresses key challenges:
Open-source models available on Hugging Face:
Based on "Jasper and Stella: distillation of SOTA embedding models" by Dun Zhang, Jiacheng Li, Ziyang Zeng, and Fulong Wang (2025).