• Home
  • Categories
  • Tags
  • Pricing
  • Submit
    Decorative pattern
    1. Home
    2. Machine Learning Models
    3. mxbai-embed-large

    mxbai-embed-large

    State-of-the-art large embedding model from Mixedbread AI, ranked first among similar-sized models, supporting Matryoshka Representation Learning and binary quantization with 700M+ training pairs.

    🌐Visit Website

    About this tool

    Overview

    mxbai-embed-large is a state-of-the-art large embedding model from mixedbread.ai. It's part of the crispy sentence embedding family from Mixedbread.

    Performance

    The model:

    • Ranked first among embedding models of similar size
    • Outperforms the new OpenAI embedding model, text-embedding-3-large
    • Matches the performance of 20x larger models like echo-mistral-7b
    • As of March 2024, archives SOTA performance for Bert-large sized models on the MTEB
    • Trained with no overlap of the MTEB data, indicating good generalization across domains, tasks and text length

    Training

    The model was trained with:

    • Over 700 million pairs using contrastive training
    • Tuned on over 30 million high quality triplets using the AnglE loss

    Key Features

    Matryoshka Representation Learning

    The model supports Matryoshka Representation Learning, allowing vector truncation to smaller dimensions without retraining.

    Binary Quantization

    Supports binary quantization for reduced storage and faster similarity search.

    Task-Specific Prompting

    For retrieval tasks, you need to provide the prompt "Represent this sentence for searching relevant passages:" for query.

    Technical Specifications

    • Suggested maximum sequence length: 512 tokens
    • Supports tasks: retrieval, classification, clustering, reranking, and summarization

    Availability

    • Hugging Face
    • Ollama
    • Docker Hub
    • Multiple integration platforms

    Pricing

    Free and open-source.

    Surveys

    Loading more......

    Information

    Websitehuggingface.co
    PublishedMar 13, 2026

    Categories

    1 Item
    Machine Learning Models

    Tags

    3 Items
    #Embeddings#Open Source#Matryoshka

    Similar Products

    6 result(s)
    stella_en

    A family of English text embedding models distilled from state-of-the-art embedding models using a novel multi-stage distillation framework. Stella models support multiple dimensions (512 to 8192) through Matryoshka Representation Learning, offering flexible embedding sizes for different use cases.

    Qwen3 Embedding
    Featured

    Multilingual embedding model supporting over 100 languages and ranking #1 on MTEB multilingual leaderboard. Offers flexible model sizes from 0.6B to 8B parameters with user-defined instructions.

    BGE-M3

    A versatile multilingual text embedding model from BAAI that supports 100+ languages and can handle inputs up to 8192 tokens. BGE-M3 is unique in supporting three retrieval methods simultaneously: dense retrieval, multi-vector retrieval, and sparse retrieval.

    gte-Qwen2-1.5B-instruct

    A state-of-the-art multilingual text embedding model from Alibaba's GTE (General Text Embedding) series, built on the Qwen2-1.5B LLM. The model supports up to 8192 tokens and incorporates bidirectional attention mechanisms for enhanced contextual understanding across diverse domains.

    INSTRUCTOR

    A task-specific text embedding model that generates customized embeddings based on natural language instructions. INSTRUCTOR achieves state-of-the-art performance on 70 diverse embedding tasks by allowing users to specify the task objective and domain.

    Snowflake Arctic Embed

    Suite of high-quality multilingual text embedding models optimized for retrieval performance, developed by Snowflake and available as open-source for commercial use.

    Decorative pattern
    Built with
    Ever Works
    Ever Works

    Connect with us

    Stay Updated

    Get the latest updates and exclusive content delivered to your inbox.

    Product

    • Categories
    • Tags
    • Pricing
    • Help

    Clients

    • Sign In
    • Register
    • Forgot password?

    Company

    • About Us
    • Admin
    • Sitemap

    Resources

    • Blog
    • Submit
    • API Documentation
    All product names, logos, and brands are the property of their respective owners. All company, product, and service names used in this repository, related repositories, and associated websites are for identification purposes only. The use of these names, logos, and brands does not imply endorsement, affiliation, or sponsorship. This directory may include content generated by artificial intelligence.
    Copyright © 2025 Awesome Vector Databases. All rights reserved.·Terms of Service·Privacy Policy·Cookies