IDEA
IDEA is an inverted, deduplication-aware index structure designed to improve storage efficiency and query performance for similarity search workloads. It is implemented as research code and targets high-dimensional vector and content-addressable data, making it relevant to large-scale vector database and ANN indexing systems.
About this tool
IDEA
Category: SDKs & Libraries
Website/Source: https://github.com/asaflevi0812/IDEA
Description
IDEA (Inverted Deduplication-Aware Index) is a research implementation of an index structure designed to improve storage efficiency and query performance for similarity search workloads. It targets high-dimensional vector data and content-addressable data, making it applicable to large-scale vector databases and approximate nearest neighbor (ANN) indexing systems. The codebase accompanies the paper “Physical vs. Logical Indexing with IDEA: Inverted Deduplication-Aware Index” (FAST ’24) and builds on the Destor open-source storage system.
Features
- Inverted, deduplication-aware index structure for similarity search workloads.
- Support for high-dimensional vector data and content-addressable data.
- Research implementation of naïve and deduplication-aware indexes, based on the Destor storage system.
- Designed for large-scale vector database / ANN indexing scenarios, focusing on storage efficiency and query performance.
- Configurable via repository-provided configs, dataset details, and keyword definitions.
- Build and automation scripts for compiling the code and generating build commands.
- Dependency installation script (
install_dependencies.sh) tailored to a clean Ubuntu Server 22.04 LTS environment.
Technical Requirements
- Operating system: Ubuntu Server 22.04 LTS (clean image recommended).
- Dependencies: Installed via the provided
install_dependencies.shscript. - Version control: Git required to clone the repository.
Pricing
- Open-source research code (no pricing information provided).
Loading more......
Information
Categories
Tags
Similar Products
6 result(s)NMSLIB is an efficient similarity search library and toolkit for high-dimensional vector spaces, supporting a variety of indexing algorithms for vector database use cases.
OasysDB is an open-source vector database focused on efficient similarity search and management of high-dimensional data.
Vexvault is an open-source vector database designed for efficient storage, management, and similarity search of high-dimensional vector data.
K-means Tree is a clustering-based data structure that organizes high-dimensional vectors for fast similarity search and retrieval. It is used as an indexing method in some vector databases to optimize performance for vector search operations.
Locality-Sensitive Hashing (LSH) is an algorithmic technique for approximate nearest neighbor search in high-dimensional vector spaces, commonly used in vector databases to speed up similarity search while reducing memory footprint.
Ruby gem for approximate nearest neighbor search that can integrate with pgvector and other backends to power vector similarity search in Ruby applications.