• Home
  • Categories
  • Tags
  • Pricing
  • Submit
    Decorative pattern
    1. Home
    2. Concepts & Definitions
    3. Product Quantization (PQ)

    Product Quantization (PQ)

    Vector compression technique that splits high-dimensional vectors into subvectors and quantizes each independently, achieving significant memory reduction while enabling approximate similarity search.

    🌐Visit Website

    About this tool

    Overview

    Product Quantization (PQ) is a vector compression technique that splits high-dimensional vectors into subvectors and quantizes each subvector independently. This achieves significant memory reduction (often 32x or more) while enabling approximate similarity search.

    How Product Quantization Works

    Compression Process

    1. Split: Divide each d-dimensional vector into m subvectors
    2. Learn Codebooks: Train a codebook (lookup table) for each subvector using k-means
    3. Quantize: Replace each subvector with its nearest codebook entry's index
    4. Store: Store only the compact codes instead of full vectors

    Search Process

    1. Quantize query vector using same splitting
    2. Pre-compute distances between query subvectors and all codebook entries
    3. Approximate full vector distances using lookup table
    4. Return top-k results

    Memory Reduction

    Typical compression:

    • Original: 768 dimensions × 4 bytes = 3,072 bytes per vector
    • PQ (m=96, k=256): 96 bytes per vector
    • Compression ratio: ~32x

    Variants

    IVF-PQ

    Combines Inverted File clustering with Product Quantization for both speed and compression.

    OPQ (Optimized Product Quantization)

    Applies a learned rotation before quantization to reduce quantization error.

    Additive Quantization

    Uses sum of multiple codebook entries for better accuracy.

    Trade-offs

    Advantages:

    • Significant memory reduction
    • Faster similarity computation
    • Enables larger datasets in memory

    Disadvantages:

    • Loss of accuracy (quantization error)
    • Requires training phase
    • Not suitable for exact search

    Configuration Parameters

    • m: Number of subvectors (segments)
    • nbits: Bits per code (determines codebook size: k=2^nbits)

    Use Cases

    • Large-scale vector search (billions of vectors)
    • Memory-constrained environments
    • When some accuracy loss is acceptable
    • Reducing infrastructure costs

    Pricing

    Implemented in open-source libraries (FAISS, ScaNN, etc.)

    Surveys

    Loading more......

    Information

    Websiteieeexplore.ieee.org
    PublishedMar 13, 2026

    Categories

    1 Item
    Concepts & Definitions

    Tags

    3 Items
    #Quantization#Compression#Optimization

    Similar Products

    6 result(s)
    Locally-Adaptive Vector Quantization

    Advanced quantization technique that applies per-vector normalization and scalar quantization, adapting the quantization bounds individually for each vector. Achieves four-fold reduction in vector size while maintaining search accuracy with 26-37% overall memory footprint reduction.

    Binary Quantization

    Extreme vector compression technique converting each dimension to a single bit (0 or 1), achieving 32x memory reduction and enabling ultra-fast Hamming distance calculations with acceptable accuracy trade-offs.

    Scalar Quantization

    Vector compression technique reducing precision of each vector component from 32-bit floats to 8-bit integers, achieving 4x memory reduction with minimal accuracy loss for vector search.

    AWQ

    Activation-aware Weight Quantization method that preserves model accuracy at 4-bit quantization by identifying and skipping important weights. Maintains 99%+ of original performance with moderate inference speed improvements.

    GPTQ

    Post-training quantization method for 4-bit weight compression that focuses on GPU inference performance. First quantization method to compress LLMs to 4-bit range while maintaining accuracy, minimizing mean squared error to weights.

    BBQ Binary Quantization

    Elasticsearch and Lucene's implementation of RaBitQ algorithm for 1-bit vector quantization, renamed as BBQ. Provides 32x compression with asymptotically optimal error bounds, enabling efficient vector search at massive scale with minimal accuracy loss.

    Decorative pattern
    Built with
    Ever Works
    Ever Works

    Connect with us

    Stay Updated

    Get the latest updates and exclusive content delivered to your inbox.

    Product

    • Categories
    • Tags
    • Pricing
    • Help

    Clients

    • Sign In
    • Register
    • Forgot password?

    Company

    • About Us
    • Admin
    • Sitemap

    Resources

    • Blog
    • Submit
    • API Documentation
    All product names, logos, and brands are the property of their respective owners. All company, product, and service names used in this repository, related repositories, and associated websites are for identification purposes only. The use of these names, logos, and brands does not imply endorsement, affiliation, or sponsorship. This directory may include content generated by artificial intelligence.
    Copyright © 2025 Awesome Vector Databases. All rights reserved.·Terms of Service·Privacy Policy·Cookies