• Home
  • Categories
  • Tags
  • Pricing
  • Submit
    Decorative pattern
    1. Home
    2. Machine Learning Models
    3. Gemini Embedding 2

    Gemini Embedding 2

    Google's first natively multimodal embedding model that maps text, images, video, audio and documents into a single embedding space. Supports over 100 languages with flexible output dimensions using Matryoshka Representation Learning.

    🌐Visit Website

    About this tool

    Overview

    Gemini Embedding 2 is Google's first natively multimodal embedding model, released March 10, 2026. It maps text, images, videos, audio and documents into a unified embedding space, capturing semantic intent across more than 100 languages.

    Key Features

    Multimodal Support

    • Text: Up to 8,192 input tokens
    • Images: Up to 6 images per request (PNG or JPEG)
    • Audio: Maximum 80 seconds per request (MP3 or WAV)
    • Video: Maximum 128 seconds per request (MP4 or MOV)
    • Documents: PDF files up to 6 pages directly

    Flexible Dimensions

    Supports multiple output dimensions:

    • 3,072 dimensions (default)
    • 1,536 dimensions
    • 768 dimensions

    Uses Matryoshka Representation Learning for dimension truncation without significant accuracy loss.

    Language Support

    Captures semantic relationships across over 100 languages, making it suitable for global applications.

    Performance

    Gemini Embedding 2 establishes new performance standards:

    • Outperforms leading models in text, image, and video tasks
    • Strong speech capabilities for audio processing
    • Superior multimodal depth in cross-modal retrieval

    Use Cases

    • Semantic Search: Multi-language and multimodal search applications
    • RAG Systems: Retrieval-Augmented Generation with diverse data types
    • Sentiment Analysis: Text and audio sentiment understanding
    • Data Clustering: Grouping similar content across modalities
    • Recommendation Systems: Cross-modal content recommendations
    • Content Moderation: Multi-format content classification
    • Video Understanding: Temporal and visual content analysis

    Integration

    Available through:

    • Google Gemini API
    • Vertex AI
    • LangChain
    • LlamaIndex
    • Haystack
    • Weaviate
    • Qdrant
    • ChromaDB
    • Vertex AI Vector Search

    Technical Specifications

    • Model Type: Multimodal embedding transformer
    • Context Window: 8,192 tokens for text
    • Embedding Dimensions: 768 / 1,536 / 3,072
    • Languages: 100+
    • Modalities: Text, Image, Video, Audio, Documents

    Pricing

    Pricing varies based on:

    • Number of API calls
    • Input modality and size
    • Output dimension selection

    Details available on Google Cloud pricing page.

    Surveys

    Loading more......

    Information

    Websiteai.google.dev
    PublishedMar 11, 2026

    Categories

    1 Item
    Machine Learning Models

    Tags

    3 Items
    #Multimodal#Embeddings#Google

    Similar Products

    6 result(s)
    Voyage Multimodal 3.5

    Next-generation multimodal embedding model built for retrieval over text, images, and videos, supporting Matryoshka embeddings with 4.56% higher accuracy than Cohere Embed v4 on visual document retrieval.

    UForm

    Pocket-sized multimodal AI for content understanding across multilingual texts, images, and video. Up to 5x faster than OpenAI CLIP with quantization-aware embeddings and support for 20+ languages.

    Cohere Embed v4

    Multilingual, multimodal enterprise embedding model supporting over 100 programming languages and primary business languages with advanced quantization for cost optimization.

    voyage-multimodal-3

    Voyage AI's first all-in-one multimodal embedding model supporting interleaved text and content-rich images including screenshots, PDFs, slide decks, tables, and figures.

    Multimodal Embeddings

    Vector representations mapping different data types (text, images, audio, video) into a shared embedding space. Enables cross-modal search and understanding.

    Mastering Multimodal RAG

    A course focused on mastering multimodal Retrieval Augmented Generation (RAG) and embeddings, which are fundamental components often stored and managed by vector databases.

    Decorative pattern
    Built with
    Ever Works
    Ever Works

    Connect with us

    Stay Updated

    Get the latest updates and exclusive content delivered to your inbox.

    Product

    • Categories
    • Tags
    • Pricing
    • Help

    Clients

    • Sign In
    • Register
    • Forgot password?

    Company

    • About Us
    • Admin
    • Sitemap

    Resources

    • Blog
    • Submit
    • API Documentation
    All product names, logos, and brands are the property of their respective owners. All company, product, and service names used in this repository, related repositories, and associated websites are for identification purposes only. The use of these names, logos, and brands does not imply endorsement, affiliation, or sponsorship. This directory may include content generated by artificial intelligence.
    Copyright © 2025 Awesome Vector Databases. All rights reserved.·Terms of Service·Privacy Policy·Cookies