Jina Embeddings v4

Universal multimodal embedding model from Jina AI supporting text and images through unified pathway. Built on Qwen2.5-VL-3B-Instruct, outperforms proprietary models on visually rich document retrieval. This is a commercial API with free tier, though OSS weights available.

Visit Website

Overview

jina-embeddings-v4 is a 3.8B parameter model that embeds text and images through a unified pathway, supporting both dense and late-interaction retrieval. Particularly strong on visually rich document retrieval, outperforming proprietary models from Google, OpenAI, and Voyage AI.

Key Features

Multimodal: Unified embedding for text and images
3.8B Parameters: Built on Qwen2.5-VL-3B-Instruct foundation
Dense + Late Interaction: Supports multiple retrieval modes
1,536 Dimensions: Compatible with many vector databases
Open Weights: Available on Hugging Face for self-hosting
API Access: Managed API with multiple tiers

Pricing

Token-Based Pricing

Cost: Approximately $0.02 per million tokens
Free Trial: 10 million tokens for new users with auto-generated API key

Rate Limits by Tier

Free: 100 RPM, 100K TPM, 2 concurrent requests
Paid: 500 RPM, 2M TPM, 50 concurrent requests
Premium: 5,000 RPM, 50M TPM, 500 concurrent requests

Image Token Calculation

Each tile costs 10 tokens
Tiles are 28x28 pixels
Image processing cost varies with image size

Pricing Model Update

New pricing model introduced May 6, 2025. Users with auto-recharge enabled before this date maintain old pricing. New pricing applies to new purchases or modifications.

Important Note on Throughput

Jina intentionally throttles API throughput for jina-embeddings-v4 to manage infrastructure costs. For production workloads requiring high throughput:

Use jina-embeddings-v3 API, or
Deploy jina-embeddings-v4 on your own infrastructure via Hugging Face

Payment Methods

Payments processed through Stripe supporting:

Credit cards
Google Pay
PayPal

Model Access

API: https://jina.ai/embeddings/
Hugging Face: jinaai/jina-embeddings-v4
Self-Hosting: Deploy on your infrastructure
Cloud Marketplaces: Azure Marketplace

Use Cases

Visually rich document retrieval
Multimodal semantic search
Document understanding with layout
Cross-modal retrieval (text→image, image→text)
RAG systems with visual content

Comparison to v3

v4 adds multimodal capabilities (text + images) with 1,536-dimensional vectors, while v3 was text-only with 1,024 dimensions. v3 offers higher API throughput for production text-only workloads.

Surveys

Loading more......

Information

Websitejina.ai

PublishedMar 6, 2026

Tags

3 Items

#commercial #multimodal #open-source

Similar Products

Elasticsearch Vector Search

Lucene KNN vector plugin for Elasticsearch search engine, enabling hybrid lexical+vector search, BM25 fusion, HNSW/IVF indexes for ANN. Used for enterprise search, RAG, multimodal apps. Integrated vs standalone like Weaviate: superior hybrid text handling but higher resource footprint.

000

BGE-VL

State-of-the-art multimodal embedding model from BAAI supporting text-to-image, image-to-text, and compositional visual search. Trained on the MegaPairs dataset with over 26 million retrieval triplets.

000

Deep Lake 4.0

AI data lake with revolutionary index-on-the-lake technology enabling sub-second queries from S3. Features 10x cost efficiency vs in-memory DBs and 2x faster than alternatives. This is a commercial platform with OSS components.

000

Deep Lake

Open-source database specializing in unstructured and multimodal data for AI/ML applications. Handles images, videos, and other data with decent vector operations, high recall for multimodal integration, and tight compatibility with PyTorch and TensorFlow.

000

Supabase Vector

Managed serverless Postgres with pgvector for vector similarity search, featuring real-time subscriptions, Edge Functions, auto-HNSW indexing, serverless scaling, and RLS for multi-tenant isolation. Built for full-stack AI apps with auth, storage, and realtime. Postgres SQL + vectors outperforms dedicated DBs in integrated app development and cost for RAG/multi-tenant use cases.

000

ClickHouse

ClickHouse is a columnar OLAP database with vector indexes (ANN via AMM, brute-force), supporting SQL queries over vectors + structured data at petabyte scale. Excels in aggregations with vectors. For analytics workloads with embeddings; faster ingestion than Postgres pgvector for big data.

000

Overview

Key Features

Multimodal: Unified embedding for text and images
3.8B Parameters: Built on Qwen2.5-VL-3B-Instruct foundation
Dense + Late Interaction: Supports multiple retrieval modes
1,536 Dimensions: Compatible with many vector databases
Open Weights: Available on Hugging Face for self-hosting
API Access: Managed API with multiple tiers

Pricing

Token-Based Pricing

Cost: Approximately $0.02 per million tokens
Free Trial: 10 million tokens for new users with auto-generated API key

Rate Limits by Tier

Free: 100 RPM, 100K TPM, 2 concurrent requests
Paid: 500 RPM, 2M TPM, 50 concurrent requests
Premium: 5,000 RPM, 50M TPM, 500 concurrent requests

Image Token Calculation

Each tile costs 10 tokens
Tiles are 28x28 pixels
Image processing cost varies with image size

Pricing Model Update

New pricing model introduced May 6, 2025. Users with auto-recharge enabled before this date maintain old pricing. New pricing applies to new purchases or modifications.

Important Note on Throughput

Jina intentionally throttles API throughput for jina-embeddings-v4 to manage infrastructure costs. For production workloads requiring high throughput:

Use jina-embeddings-v3 API, or
Deploy jina-embeddings-v4 on your own infrastructure via Hugging Face

Payment Methods

Payments processed through Stripe supporting:

Credit cards
Google Pay
PayPal

Model Access

API: https://jina.ai/embeddings/
Hugging Face: jinaai/jina-embeddings-v4
Self-Hosting: Deploy on your infrastructure
Cloud Marketplaces: Azure Marketplace

Use Cases

Visually rich document retrieval
Multimodal semantic search
Document understanding with layout
Cross-modal retrieval (text→image, image→text)
RAG systems with visual content

Comparison to v3

v4 adds multimodal capabilities (text + images) with 1,536-dimensional vectors, while v3 was text-only with 1,024 dimensions. v3 offers higher API throughput for production text-only workloads.

Jina Embeddings v4

Overview

Key Features

Pricing

Token-Based Pricing

Rate Limits by Tier

Image Token Calculation

Pricing Model Update

Important Note on Throughput

Payment Methods

Model Access

Use Cases

Comparison to v3

Information

Categories

Tags

Similar Products

Connect with us

Stay Updated

Product

Clients

Company

Resources

Jina Embeddings v4

Overview

Key Features

Pricing

Token-Based Pricing

Rate Limits by Tier

Image Token Calculation

Pricing Model Update

Important Note on Throughput

Payment Methods

Model Access

Use Cases

Comparison to v3

Information

Categories

Tags

Similar Products