• Home
  • Categories
  • Tags
  • Pricing
  • Submit
    Decorative pattern
    1. Home
    2. Concepts & Definitions
    3. Locality-Sensitive Hashing

    Locality-Sensitive Hashing

    Locality-Sensitive Hashing (LSH) is an algorithmic technique for approximate nearest neighbor search in high-dimensional vector spaces, commonly used in vector databases to speed up similarity search while reducing memory footprint.

    🌐Visit Website

    About this tool

    Locality-Sensitive Hashing

    Category: Concepts & Definitions
    Tags: ann, similarity-search, high-dimensional, optimization
    Source: Wikipedia - Locality-sensitive hashing


    Description

    Locality-Sensitive Hashing (LSH) is an algorithmic technique in computer science designed for approximate nearest neighbor search in high-dimensional spaces. It is a form of fuzzy hashing that hashes similar input items into the same "buckets" with high probability, facilitating efficient similarity search and data clustering. Unlike traditional hash functions which minimize collisions, LSH maximizes collisions for similar items, making it especially useful for reducing the dimensionality of data while preserving relative distances.


    Features

    • Approximate Nearest Neighbor Search: Efficiently finds points in high-dimensional spaces that are close to a given query point.
    • Similarity Preservation: Hashes similar items to the same buckets with high probability, enabling efficient similarity search and clustering.
    • Dimensionality Reduction: Reduces high-dimensional data to a lower-dimensional representation while maintaining proximity relationships.
    • Flexible Hash Families: Supports various hash function families for different distance or similarity measures (e.g., Hamming distance, Jaccard index, cosine similarity).
    • Multiple Construction Methods:
      • Bit Sampling: For Hamming distance.
      • Min-wise Independent Permutations (MinHash): For Jaccard similarity.
      • Random Projections (SimHash): For cosine similarity.
      • Stable Distributions: For Lp norms.
      • Semantic Hashing: Uses neural networks to learn hash codes with semantic similarity.
    • Amplification Techniques: Combines multiple hash functions using AND-/OR-constructions to improve selectivity and collision probabilities.
    • Algorithmic Guarantees: Provides theoretical bounds on preprocessing and query time, space usage, and success probability for approximate nearest neighbor queries.
    • Open Source Implementations: Notable LSH-based hashes include Nilsimsa (for spam detection) and TLSH (for security and digital forensics), with available open-source code.
    • Scalability: Suitable for large datasets due to efficient preprocessing and query algorithms with sub-linear or near-linear time complexity.
    • Applications:
      • Near-duplicate detection
      • Hierarchical clustering
      • Genome-wide association studies
      • Image, audio, and video similarity identification
      • Shared memory and physical data organization in computing
      • Neural network training optimization
      • Computer security and digital forensics
      • Large-scale machine learning

    References and Further Reading

    • "Mining of Massive Datasets" by Rajaraman, A.; Ullman, J.
    • "Similarity Search in High Dimensions via Hashing" by Gionis, A.; Indyk, P.; Motwani, R.
    • "Similarity Estimation Techniques from Rounding Algorithms" by Charikar, Moses S.

    See Also

    • Bloom filter, Feature hashing, Random projection, Principal component analysis

    Pricing

    Not applicable (concept/algorithm, not a commercial product).

    Surveys

    Loading more......

    Information

    Websiteen.wikipedia.org
    PublishedMay 13, 2025

    Categories

    1 Item
    Concepts & Definitions

    Tags

    4 Items
    #Ann#Similarity Search#High Dimensional#Optimization

    Similar Products

    6 result(s)
    Spectral Hashing

    Spectral Hashing is a method for approximate nearest neighbor search that uses spectral graph theory to generate compact binary codes, often applied in vector databases to enhance retrieval efficiency on large-scale, high-dimensional data.

    NMSLIB

    NMSLIB is an efficient similarity search library and toolkit for high-dimensional vector spaces, supporting a variety of indexing algorithms for vector database use cases.

    K-means Tree

    K-means Tree is a clustering-based data structure that organizes high-dimensional vectors for fast similarity search and retrieval. It is used as an indexing method in some vector databases to optimize performance for vector search operations.

    Optimized Product Quantization (OPQ)

    Optimized Product Quantization (OPQ) enhances Product Quantization by optimizing space decomposition and codebooks, leading to lower quantization distortion and higher accuracy in vector search. OPQ is widely used in advanced vector databases for improving recall and search quality.

    ANN Library

    A C++ library for approximate nearest neighbor searching in arbitrarily high dimensions, developed by David Mount and Sunil Arya at the University of Maryland. Provides data structures and algorithms for both exact and approximate nearest neighbor searching.

    LoRANN

    Low-Rank Matrix Factorization algorithm for Approximate Nearest Neighbor Search, offering competitive performance with faster query times than leading libraries at various recall levels.

    Decorative pattern
    Built with
    Ever Works
    Ever Works

    Connect with us

    Stay Updated

    Get the latest updates and exclusive content delivered to your inbox.

    Product

    • Categories
    • Tags
    • Pricing
    • Help

    Clients

    • Sign In
    • Register
    • Forgot password?

    Company

    • About Us
    • Admin
    • Sitemap

    Resources

    • Blog
    • Submit
    • API Documentation
    All product names, logos, and brands are the property of their respective owners. All company, product, and service names used in this repository, related repositories, and associated websites are for identification purposes only. The use of these names, logos, and brands does not imply endorsement, affiliation, or sponsorship. This directory may include content generated by artificial intelligence.
    Copyright © 2025 Awesome Vector Databases. All rights reserved.·Terms of Service·Privacy Policy·Cookies