• Home
  • Categories
  • Pricing
  • Submit
    Built with
    Ever Works
    Ever Works

    Connect with us

    Stay Updated

    Get the latest updates and exclusive content delivered to your inbox.

    Product

    • Categories
    • Pricing
    • Help

    Clients

    • Sign In
    • Register
    • Forgot password?

    Company

    • About Us
    • Admin
    • Sitemap

    Resources

    • Blog
    • Submit
    • API Documentation
    All product names, logos, and brands are the property of their respective owners. All company, product, and service names used in this repository, related repositories, and associated websites are for identification purposes only. The use of these names, logos, and brands does not imply endorsement, affiliation, or sponsorship. This directory may include content generated by artificial intelligence.
    Copyright © 2025 Awesome Vector Databases. All rights reserved.·Terms of Service·Privacy Policy·Cookies
    Decorative pattern
    Decorative pattern
    1. Home
    2. Data Processing
    3. PageIndex

    PageIndex

    Open-source tool by VectifyAI for pagewise document indexing that converts PDF pages into image representations for downstream multimodal embedding and retrieval. Designed to support late-interaction-based retrieval approaches like ColPali by preserving original document layout and visual structure.

    PageIndex

    Open-source tool by VectifyAI for pagewise document indexing that converts PDF pages into image representations for downstream multimodal embedding and retrieval. Designed to support late-interaction-based retrieval approaches like ColPali by preserving original document layout and visual structure.

    https://github.com/VectifyAI/PageIndex

    Surveys

    Loading more......

    Information

    Websitegithub.com
    PublishedApr 4, 2026

    Categories

    1 Item
    Data Processing

    Tags

    3 Items
    #open-source#multimodal#document-parsing

    Similar Products

    6 result(s)

    BGE-VL

    State-of-the-art multimodal embedding model from BAAI supporting text-to-image, image-to-text, and compositional visual search. Trained on the MegaPairs dataset with over 26 million retrieval triplets.

    Featured

    Jina Embeddings v4

    Universal multimodal embedding model from Jina AI supporting text and images through unified pathway. Built on Qwen2.5-VL-3B-Instruct, outperforms proprietary models on visually rich document retrieval. This is a commercial API with free tier, though OSS weights available.

    Featured

    Nomic Embed Text v1.5

    Multimodal embedding model with 137M parameters that outperforms OpenAI text-embedding-3-small on both short and long context tasks. Features Matryoshka Representation Learning for flexible embedding dimensions.

    BigVectorBench

    An innovative benchmark suite for thoroughly evaluating vector database performance on heterogeneous data embeddings and compound queries for real-world multimodal applications.

    Docling

    Open-source document parsing framework from IBM with 97.9% accuracy in complex table extraction and excellent text fidelity. Self-hostable solution for converting PDFs, spreadsheets, and scanned images into structured data for RAG pipelines.

    Deep Lake

    Deep Lake is a vector database designed as a data lake for AI, capable of storing and managing vector embeddings, text, images, and videos. It utilizes a tensor format for efficient querying and integration with AI algorithms, making it suitable for similarity search and machine learning workflows. It is open-source and tailored for handling unstructured and multimodal data, with seamless integration with frameworks like PyTorch and TensorFlow.