DocArray

DocArray is an open-source Python library designed for representing, storing, and retrieving multimodal data, making it suitable for AI and machine learning workflows involving complex data types such as images, text, audio, and video.

Features

Multimodal Data Representation: Define and work with documents containing various data types (images, text, audio, video) using Python classes.
Pydantic Compatibility: Built on top of Pydantic, allowing type validation and integration with other Pydantic-based tools.
Custom Data Models: Create custom document schemas using BaseDoc, specifying fields for different modalities and types.
Tensor Shape Specification: Ability to specify tensor shapes for data fields, supporting frameworks like PyTorch, NumPy, and TensorFlow.
Nested Documents: Compose complex, nested document structures for handling multimodal datasets.
Batch Processing: Process and manipulate batches of documents via DocVec and DocList collections, enabling bulk operations and efficient workflows.
Bulk Field Access: Retrieve and manipulate fields across all documents in a collection with simple syntax.
Flexible Embedding Storage: Store and manage vector embeddings computed from any model, facilitating downstream search and retrieval tasks.
Open Source: Distributed under the Apache License 2.0 and part of the LF AI & Data Foundation as a sandbox project.
Python Ecosystem Integration: Seamlessly integrates with the broader Python and machine learning ecosystem.
Installation via pip: Easily installable and updatable from PyPI.

Pricing

DocArray is open-source software and free to use under the Apache License 2.0.

DocArray

About this tool

DocArray

Features

Pricing

Links

Information

Categories

Tags

Connect with us

Stay Updated

Product

Company

Resources