• Home
  • Categories
  • Tags
  • Pricing
  • Submit
    1. Home
    2. Machine Learning Models
    3. Nomic Embed Text

    Nomic Embed Text

    First fully reproducible open-source text embedding model with 8,192 context length. v2 introduces Mixture-of-Experts architecture for multilingual embeddings. Outperforms OpenAI models on benchmarks. This is an OSS model under Apache 2.0 license.

    🌐Visit Website

    About this tool

    Overview

    Nomic-embed-text is the first fully reproducible, open-source text embedding model with 8,192 context length that outperforms both OpenAI Ada-002 and text-embedding-3-small on short and long context benchmarks.

    Key Features

    • Fully Open Source: Training code, model weights, and complete training data released
    • Apache 2.0 License: Free for commercial use
    • 8,192 Context Length: Long context support
    • Reproducible: Complete replication possible with released data and code
    • High Performance: Outperforms OpenAI models on MTEB benchmarks

    Model Versions

    V1 (nomic-embed-text-v1)

    • First fully reproducible embedding model
    • 8,192 context length
    • Trained on weakly related text pairs and high-quality labeled datasets
    • English-focused

    V1.5 (nomic-embed-text-v1.5)

    • Matryoshka Representation Learning support
    • Flexible embedding dimensions
    • Trade-off between size and performance
    • Minimal performance reduction with smaller dimensions

    V2 (nomic-embed-text-v2)

    • Mixture-of-Experts (MoE) Architecture: First MoE text embedding model
    • Multilingual: Trained on 1.6 billion contrastive pairs across ~100 languages
    • Expanded Dataset: Broader multilingual coverage
    • Production-Ready: Optimized for real-world applications

    Training Approach

    1. Stage 1 - Unsupervised Contrastive: Training on weakly related text pairs from StackExchange, Quora, Amazon reviews, news articles
    2. Stage 2 - Fine-tuning: Leverages high-quality labeled datasets including search queries and web search answers

    Access Methods

    • Hugging Face: Direct model download and inference
    • Ollama: ollama pull nomic-embed-text
    • Nomic API: Managed API endpoint
    • LlamaIndex Integration: Native support
    • Qdrant Integration: Built-in connector

    Use Cases

    • Long-context semantic search
    • Multilingual retrieval applications
    • Document embedding and clustering
    • RAG systems requiring long context
    • Research requiring reproducibility

    Performance Highlights

    • Outperforms OpenAI text-embedding-ada-002
    • Competitive with text-embedding-3-small
    • Strong performance on both short and long context tasks
    • Excellent multilingual capabilities (v2)

    Pricing

    Free and open-source under Apache 2.0 license. No licensing costs. Nomic API offers managed hosting with usage-based pricing for convenience.

    Surveys

    Loading more......

    Information

    Websitewww.nomic.ai
    PublishedMar 6, 2026

    Categories

    1 Item
    Machine Learning Models

    Tags

    3 Items
    #Open Source
    #Embedding
    #Multilingual

    Similar Products

    6 result(s)
    E5 Embeddings

    Open-source text embedding models from Microsoft supporting 100+ languages. Features small, base, and large variants with weakly-supervised contrastive pre-training. This is an OSS model family released by Microsoft Research.

    pgai

    Open-source PostgreSQL extension and Python library that automates embedding generation and synchronization for RAG and semantic search applications. Features pgai Vectorizer for declarative embedding pipelines. This is an OSS solution.

    puck

    Puck is an open-source vector search engine designed for fast similarity search and retrieval of embedding vectors.

    Jina Embeddings v4
    Featured

    Universal multimodal embedding model from Jina AI supporting text and images through unified pathway. Built on Qwen2.5-VL-3B-Instruct, outperforms proprietary models on visually rich document retrieval. This is a commercial API with free tier, though OSS weights available.

    Cohere Embed v3

    Commercial text embedding model from Cohere with multilingual support and 1,024-dimensional vectors. Optimized for semantic search and retrieval tasks. This is a commercial API service with pay-per-use pricing.

    Voyage AI Embeddings

    Commercial embedding models built for enterprise-grade semantic search and RAG applications. Features voyage-3 and voyage-3-large models with multimodal support. This is a commercial API service with usage-based pricing.

    Built with
    Ever Works
    Ever Works

    Connect with us

    Stay Updated

    Get the latest updates and exclusive content delivered to your inbox.

    Product

    • Categories
    • Tags
    • Pricing
    • Help

    Clients

    • Sign In
    • Register
    • Forgot password?

    Company

    • About Us
    • Admin
    • Sitemap

    Resources

    • Blog
    • Submit
    • API Documentation
    All product names, logos, and brands are the property of their respective owners. All company, product, and service names used in this repository, related repositories, and associated websites are for identification purposes only. The use of these names, logos, and brands does not imply endorsement, affiliation, or sponsorship. This directory may include content generated by artificial intelligence.
    Copyright © 2025 Awesome Vector Databases. All rights reserved.·Terms of Service·Privacy Policy·Cookies