



Salesforce's family of state-of-the-art embedding models including SFR-Embedding-Mistral for text and SFR-Embedding-Code for code retrieval. SFR-Embedding-Mistral achieved #1 on the MTEB benchmark with a 67.6 average score, surpassing OpenAI and Cohere models.
Loading more......
SFR-Embedding is Salesforce AI Research's groundbreaking family of embedding models that have achieved state-of-the-art performance on major benchmarks. The family includes models for both general text and code retrieval.
MTEB Benchmark Leadership:
Built upon E5-mistral-7b-instruct and Mistral-7B-v0.1, with significant enhancements:
Excels in retrieval tasks, making it ideal for Retrieval Augmented Generation (RAG) workflows where finding and selecting relevant information from large datasets is crucial.
An advanced version with multi-stage training that further improves on the original SFR-Embedding-Mistral.
Redefines state-of-the-art in code retrieval:
Available in three sizes to balance performance and efficiency:
SFR-Embedding-Code-2B_R: 2 billion parameters
SFR-Embedding-Code-400M_R: 400 million parameters
SFR-Embedding-Code-7B: 7 billion parameters (if available)
Supports 12 programming languages including:
Unlike code-only models, SFR-Embedding-Code handles both:
SFR-Embedding leverages transfer learning and advanced training techniques to push beyond previous state-of-the-art models, demonstrating that careful training methodology can achieve significant improvements.
Hugging Face Models:
License: Released for research purposes
Collection: All models available in the SFR-Embedding collection on Hugging Face
SFR-Embedding represents a significant milestone in embedding model development, proving that focused research can achieve meaningful improvements over existing state-of-the-art models from well-funded labs.