Nomic Embed Text

First fully reproducible open-source text embedding model with 8,192 context length. v2 introduces Mixture-of-Experts architecture for multilingual embeddings. Outperforms OpenAI models on benchmarks. This is an OSS model under Apache 2.0 license.

Visit Website

Overview

Nomic-embed-text is the first fully reproducible, open-source text embedding model with 8,192 context length that outperforms both OpenAI Ada-002 and text-embedding-3-small on short and long context benchmarks.

Key Features

Fully Open Source: Training code, model weights, and complete training data released
Apache 2.0 License: Free for commercial use
8,192 Context Length: Long context support
Reproducible: Complete replication possible with released data and code
High Performance: Outperforms OpenAI models on MTEB benchmarks

Model Versions

V1 (nomic-embed-text-v1)

First fully reproducible embedding model
8,192 context length
Trained on weakly related text pairs and high-quality labeled datasets
English-focused

V1.5 (nomic-embed-text-v1.5)

Matryoshka Representation Learning support
Flexible embedding dimensions
Trade-off between size and performance
Minimal performance reduction with smaller dimensions

V2 (nomic-embed-text-v2)

Mixture-of-Experts (MoE) Architecture: First MoE text embedding model
Multilingual: Trained on 1.6 billion contrastive pairs across ~100 languages
Expanded Dataset: Broader multilingual coverage
Production-Ready: Optimized for real-world applications

Training Approach

Stage 1 - Unsupervised Contrastive: Training on weakly related text pairs from StackExchange, Quora, Amazon reviews, news articles
Stage 2 - Fine-tuning: Leverages high-quality labeled datasets including search queries and web search answers

Access Methods

Hugging Face: Direct model download and inference
Ollama: ollama pull nomic-embed-text
Nomic API: Managed API endpoint
LlamaIndex Integration: Native support
Qdrant Integration: Built-in connector

Use Cases

Long-context semantic search
Multilingual retrieval applications
Document embedding and clustering
RAG systems requiring long context
Research requiring reproducibility

Performance Highlights

Outperforms OpenAI text-embedding-ada-002
Competitive with text-embedding-3-small
Strong performance on both short and long context tasks
Excellent multilingual capabilities (v2)

Pricing

Free and open-source under Apache 2.0 license. No licensing costs. Nomic API offers managed hosting with usage-based pricing for convenience.

Surveys

Loading more......

Information

Websitewww.nomic.ai

PublishedMar 6, 2026

Tags

3 Items

#open-source #embedding #multilingual

Similar Products

jina-embeddings-v3

Frontier multilingual text embedding model with 570M parameters and 8192 token-length, featuring task-specific LoRA adapters and outperforming OpenAI and Cohere embeddings on MTEB benchmark.

000

Qwen3 Embedding

Multilingual embedding model supporting over 100 languages and ranking #1 on MTEB multilingual leaderboard. Offers flexible model sizes from 0.6B to 8B parameters with user-defined instructions.

000

ClickHouse

ClickHouse is a columnar OLAP database with vector indexes (ANN via AMM, brute-force), supporting SQL queries over vectors + structured data at petabyte scale. Excels in aggregations with vectors. For analytics workloads with embeddings; faster ingestion than Postgres pgvector for big data.

000

puck

Puck is an open-source vector search engine designed for fast similarity search and retrieval of embedding vectors.

000

mxbai-rerank-base-v2

A 0.5B parameter reranking model by Mixedbread AI that provides an excellent balance of speed and accuracy, supporting 100+ languages and processing up to 8K tokens with reinforcement learning training for enhanced search relevance.

000

BGE-M3

A versatile multilingual text embedding model from BAAI that supports 100+ languages and can handle inputs up to 8192 tokens. BGE-M3 is unique in supporting three retrieval methods simultaneously: dense retrieval, multi-vector retrieval, and sparse retrieval.

000

Overview

Key Features

Fully Open Source: Training code, model weights, and complete training data released
Apache 2.0 License: Free for commercial use
8,192 Context Length: Long context support
Reproducible: Complete replication possible with released data and code
High Performance: Outperforms OpenAI models on MTEB benchmarks

Model Versions

V1 (nomic-embed-text-v1)

First fully reproducible embedding model
8,192 context length
Trained on weakly related text pairs and high-quality labeled datasets
English-focused

V1.5 (nomic-embed-text-v1.5)

Matryoshka Representation Learning support
Flexible embedding dimensions
Trade-off between size and performance
Minimal performance reduction with smaller dimensions

V2 (nomic-embed-text-v2)

Mixture-of-Experts (MoE) Architecture: First MoE text embedding model
Multilingual: Trained on 1.6 billion contrastive pairs across ~100 languages
Expanded Dataset: Broader multilingual coverage
Production-Ready: Optimized for real-world applications

Training Approach

Stage 1 - Unsupervised Contrastive: Training on weakly related text pairs from StackExchange, Quora, Amazon reviews, news articles
Stage 2 - Fine-tuning: Leverages high-quality labeled datasets including search queries and web search answers

Access Methods

Hugging Face: Direct model download and inference
Ollama: ollama pull nomic-embed-text
Nomic API: Managed API endpoint
LlamaIndex Integration: Native support
Qdrant Integration: Built-in connector

Use Cases

Long-context semantic search
Multilingual retrieval applications
Document embedding and clustering
RAG systems requiring long context
Research requiring reproducibility

Performance Highlights

Outperforms OpenAI text-embedding-ada-002
Competitive with text-embedding-3-small
Strong performance on both short and long context tasks
Excellent multilingual capabilities (v2)

Pricing

Free and open-source under Apache 2.0 license. No licensing costs. Nomic API offers managed hosting with usage-based pricing for convenience.

Nomic Embed Text

Overview

Key Features

Model Versions

V1 (nomic-embed-text-v1)

V1.5 (nomic-embed-text-v1.5)

V2 (nomic-embed-text-v2)

Training Approach

Access Methods

Use Cases

Performance Highlights

Pricing

Information

Categories

Tags

Similar Products

Connect with us

Stay Updated

Product

Clients

Company

Resources

Nomic Embed Text

Overview

Key Features

Model Versions

V1 (nomic-embed-text-v1)

V1.5 (nomic-embed-text-v1.5)

V2 (nomic-embed-text-v2)

Training Approach

Access Methods

Use Cases

Performance Highlights

Pricing

Information

Categories

Tags

Similar Products