DET-LSH
DET-LSH is a locality-sensitive hashing scheme that introduces a dynamic encoding tree structure to accelerate approximate nearest neighbor (ANN) search in high-dimensional spaces. While it is a research algorithm rather than a production database, it directly targets the core operation behind vector databases—efficient ANN search over vector embeddings—and is relevant for designing or optimizing vector indexing components within vector database systems.
About this tool
DET-LSH
Category: Research Papers & Surveys
Brand: arXiv
Source: Paper PDF
Code & Artifacts: GitHub – WeiJiuQi/DET-LSH
Overview
DET-LSH is a locality-sensitive hashing (LSH) scheme designed for approximate nearest neighbor (ANN) search in high-dimensional Euclidean spaces. It introduces a Dynamic Encoding Tree (DE-Tree) structure to accelerate both indexing and querying, with a focus on improving indexing efficiency—an aspect often underemphasized in traditional LSH-based methods.
Key Concepts
- Approximate Nearest Neighbor (ANN) Search: Targets efficient retrieval of points whose distances to a query are within a factor (c) of the true nearest neighbor distance in high-dimensional spaces.
- Locality-Sensitive Hashing (LSH): A hashing framework that increases collision probability for nearby points to enable sublinear-time ANN search.
Features
Dynamic Encoding Tree (DE-Tree)
- Encoding-based index tree designed specifically for ANN tasks.
- Focuses on improving indexing efficiency rather than only query-time optimizations.
- Avoids directly partitioning the raw multi-dimensional space, mitigating performance degradation as dimensionality increases.
- Supports efficient range queries based on Euclidean distance.
DET-LSH Scheme
- Builds multiple independent DE-Tree indexes over the data.
- Employs a novel query strategy that:
- Performs range queries across multiple DE-Trees.
- Reduces the probability of missing exact nearest neighbor points, thereby improving recall and overall query accuracy.
- Designed as a general LSH-based framework that can be used to inspire or implement vector indexing components in vector databases.
Theoretical Properties
- Provides probabilistic guarantees on query accuracy, in line with classical LSH theory.
- Formally analyzed within the approximate nearest neighbor ((c)-ANN) framework for Euclidean distance.
Performance Characteristics
- Targets high-dimensional Euclidean spaces where the curse of dimensionality makes exact NN search impractical.
- Experimental results on real-world datasets (as reported in the paper):
- Up to 6× speedup in indexing time compared to state-of-the-art LSH-based methods.
- Up to 2× speedup in query time.
- Higher query accuracy than competing LSH-based approaches under comparable settings.
Use Cases
- Designing or optimizing vector indexing components in vector databases and search systems.
- High-dimensional ANN search applications in:
- Databases and data management
- Information retrieval
- Data mining
- Machine learning and embedding-based similarity search
Publication & Licensing
- Venue: PVLDB, Vol. 17, No. 9 (2024), pp. 2241–2254.
- DOI: 10.14778/3665844.3665854
- License: Creative Commons BY-NC-ND 4.0 (non-commercial, no derivatives for the paper text).
Pricing
- Not applicable. DET-LSH is a research algorithm with an academic publication and open-source artifacts rather than a commercial product or service with pricing plans.
Loading more......
Information
Categories
Tags
Similar Products
6 result(s)This work by Jingfan Meng is a comprehensive research thesis on efficient locality-sensitive hashing (LSH), covering algorithmic solutions, core primitives, and applications for approximate nearest neighbor search. It is relevant to vector databases because LSH-based indexing is a foundational technique for scalable similarity search over high-dimensional vectors, informing the design of vector indexes, retrieval engines, and similarity search modules in modern vector database systems.
An influential paper analyzing and improving approximate nearest neighbor search methods for high-dimensional data, highly relevant for developing and understanding vector databases.
RaBitQ is an open-source library implementing the "Quantizing High-Dimensional Vectors with a Theoretical Error Bound for Approximate Nearest Neighbor Search" method, providing vector quantization and compression techniques designed to improve efficiency and accuracy of ANN search engines and vector databases operating in high-dimensional spaces.
vsag is an Alibaba open-source library implementing efficient vector search algorithms, including approximate nearest neighbor search for high-dimensional vectors.
MRPT (Multi-Resolution Proximity Trees) is an open-source library for fast approximate nearest neighbor search in high-dimensional vector spaces, applicable to vector database backends.
An open-source library for approximate nearest neighbor search in high-dimensional spaces, often used as a backend for vector databases and search engines.