• Home
  • Categories
  • Tags
  • Pricing
  • Submit
  1. Home
  2. Concepts & Definitions
  3. Locality-Sensitive Hashing

Locality-Sensitive Hashing

Locality-Sensitive Hashing (LSH) is an algorithmic technique for approximate nearest neighbor search in high-dimensional vector spaces, commonly used in vector databases to speed up similarity search while reducing memory footprint.

🌐Visit Website

About this tool

Locality-Sensitive Hashing

Category: Concepts & Definitions
Tags: ann, similarity-search, high-dimensional, optimization
Source: Wikipedia - Locality-sensitive hashing


Description

Locality-Sensitive Hashing (LSH) is an algorithmic technique in computer science designed for approximate nearest neighbor search in high-dimensional spaces. It is a form of fuzzy hashing that hashes similar input items into the same "buckets" with high probability, facilitating efficient similarity search and data clustering. Unlike traditional hash functions which minimize collisions, LSH maximizes collisions for similar items, making it especially useful for reducing the dimensionality of data while preserving relative distances.


Features

  • Approximate Nearest Neighbor Search: Efficiently finds points in high-dimensional spaces that are close to a given query point.
  • Similarity Preservation: Hashes similar items to the same buckets with high probability, enabling efficient similarity search and clustering.
  • Dimensionality Reduction: Reduces high-dimensional data to a lower-dimensional representation while maintaining proximity relationships.
  • Flexible Hash Families: Supports various hash function families for different distance or similarity measures (e.g., Hamming distance, Jaccard index, cosine similarity).
  • Multiple Construction Methods:
    • Bit Sampling: For Hamming distance.
    • Min-wise Independent Permutations (MinHash): For Jaccard similarity.
    • Random Projections (SimHash): For cosine similarity.
    • Stable Distributions: For Lp norms.
    • Semantic Hashing: Uses neural networks to learn hash codes with semantic similarity.
  • Amplification Techniques: Combines multiple hash functions using AND-/OR-constructions to improve selectivity and collision probabilities.
  • Algorithmic Guarantees: Provides theoretical bounds on preprocessing and query time, space usage, and success probability for approximate nearest neighbor queries.
  • Open Source Implementations: Notable LSH-based hashes include Nilsimsa (for spam detection) and TLSH (for security and digital forensics), with available open-source code.
  • Scalability: Suitable for large datasets due to efficient preprocessing and query algorithms with sub-linear or near-linear time complexity.
  • Applications:
    • Near-duplicate detection
    • Hierarchical clustering
    • Genome-wide association studies
    • Image, audio, and video similarity identification
    • Shared memory and physical data organization in computing
    • Neural network training optimization
    • Computer security and digital forensics
    • Large-scale machine learning

References and Further Reading

  • "Mining of Massive Datasets" by Rajaraman, A.; Ullman, J.
  • "Similarity Search in High Dimensions via Hashing" by Gionis, A.; Indyk, P.; Motwani, R.
  • "Similarity Estimation Techniques from Rounding Algorithms" by Charikar, Moses S.

See Also

  • Bloom filter, Feature hashing, Random projection, Principal component analysis

Pricing

Not applicable (concept/algorithm, not a commercial product).

Surveys

Loading more......

Information

Websiteen.wikipedia.org
PublishedMay 13, 2025

Categories

1 Item
Concepts & Definitions

Tags

4 Items
#ANN
#similarity search
#high-dimensional
#optimization

Similar Products

6 result(s)
Spectral Hashing

Spectral Hashing is a method for approximate nearest neighbor search that uses spectral graph theory to generate compact binary codes, often applied in vector databases to enhance retrieval efficiency on large-scale, high-dimensional data.

NMSLIB

NMSLIB is an efficient similarity search library and toolkit for high-dimensional vector spaces, supporting a variety of indexing algorithms for vector database use cases.

K-means Tree

K-means Tree is a clustering-based data structure that organizes high-dimensional vectors for fast similarity search and retrieval. It is used as an indexing method in some vector databases to optimize performance for vector search operations.

Optimized Product Quantization (OPQ)

Optimized Product Quantization (OPQ) enhances Product Quantization by optimizing space decomposition and codebooks, leading to lower quantization distortion and higher accuracy in vector search. OPQ is widely used in advanced vector databases for improving recall and search quality.

Neighbor

Ruby gem for approximate nearest neighbor search that can integrate with pgvector and other backends to power vector similarity search in Ruby applications.

AiSAQ

AiSAQ is an all-in-storage approximate nearest neighbor search system that uses product quantization to enable DRAM-free vector similarity search, serving as a specialized vector search/indexing approach for large-scale information retrieval.

Built with
Ever Works
Ever Works

Connect with us

Stay Updated

Get the latest updates and exclusive content delivered to your inbox.

Product

  • Categories
  • Tags
  • Pricing
  • Help

Clients

  • Sign In
  • Register
  • Forgot password?

Company

  • About Us
  • Admin
  • Sitemap

Resources

  • Blog
  • Submit
  • API Documentation
All product names, logos, and brands are the property of their respective owners. All company, product, and service names used in this repository, related repositories, and associated websites are for identification purposes only. The use of these names, logos, and brands does not imply endorsement, affiliation, or sponsorship. This directory may include content generated by artificial intelligence.
Copyright © 2025 Acme. All rights reserved.·Terms of Service·Privacy Policy·Cookies