
E5-Mistral-7B-Instruct
Open-source embeddings model from Microsoft initialized from Mistral-7B-v0.1, achieving state-of-the-art BEIR score of 56.9 for English text embedding and retrieval tasks with 4096-dimensional vectors.
About this tool
Overview
E5-Mistral is an open-source embeddings model developed by Microsoft, released under the MIT license. This E5 embedding model by Microsoft is initialized from Mistral-7B-v0.1 and fine-tuned on a mixture of multilingual datasets.
Technical Specifications
- Layers: 32 layers
- Embedding Size: 4096 dimensions
- Performance: Achieves a BEIR score of 56.9
- Model Size: 14GB (the biggest on the MTEB leaderboard but also top performing)
Key Features
Instruction-Based Customization
The task definition should be a one-sentence instruction that describes the task. This is a way to customize text embeddings for different scenarios through natural language instructions.
High-Quality Representations
Built with PyTorch, it generates high-quality vector representations useful for:
- Semantic search
- Information retrieval
- Clustering tasks
- Text similarity
Language Support
Since Mistral-7B-v0.1 is mainly trained on English data, it's recommended to use this model for English only.
Requirements
For e5-mistral-7b-instruct, it would require transformers>=4.34 to load Mistral model.
Microsoft Integration
The model is available on:
- Microsoft's AI Model Catalog as second-state-e5-mistral-7b-instruct-embedding-gguf
- Part of Microsoft's UniLM project
- Hugging Face model hub
Use Cases
- Enterprise semantic search
- Document retrieval systems
- Question answering pipelines
- Embedding-based classification
Pricing
Free and open-source under MIT license.
Loading more......
Information
Categories
Tags
Similar Products
6 result(s)