puck

Puck is an open-source vector search engine designed for fast similarity search and retrieval of embedding vectors.

About this tool

puck

Source: https://github.com/baidu/puck

Category: open-sources

Tags: open-source, vector-search, similarity-search, embedding


Description

Puck is an open-source, high-performance vector search engine designed for fast similarity search and retrieval of embedding vectors. It is intended for large-scale industrial applications where memory constraints, computational resources, and database size are critical factors.


Features

  • Approximate Nearest Neighbor (ANN) Search: Supports fast and efficient similarity search.
  • Two Algorithms:
    • Puck: Optimized for large-scale datasets, with memory efficiency and high recall-vs-latency performance. Uses a two-layered architectural design for inverted indices and multi-level quantization.
    • Tinker: Designed for smaller datasets (e.g., 10M, 100M). Offers better performance than Nmslib in benchmarks but uses more memory than Puck.
  • Written in C++: Provides Python 3 wrappers for integration with Python projects.
  • Similarity Metrics Supported:
    • Cosine similarity
    • L2 (Euclidean) distance
    • Inner Product (IP, with transformation to cosine distance)
  • Memory Efficiency:
    • Puck uses compressed vectors (after product quantization), reducing memory usage to about 1/4 of the original size by default.
    • Tinker requires more memory to store similarity point relationships.
  • Benchmark Results:
    • Puck demonstrated top performance on multiple 1B-datasets in NeurIPS'21 competition track.
    • Performance improvements since initial release (up to 70% increase).
  • Flexible Configuration:
    • Configurable via files for training, building, and searching.
    • Supports different vector formats (.fvecs) and raw little endian storage.
  • Build & Deployment:
    • Requires MKL, Python 3.6+, and CMake 3.21+ for building.
    • Includes demos and tools for training, building, and searching.
  • Benchmark Tools:
    • Includes scripts and configs for benchmarking against other ANN libraries (e.g., Faiss, Nmslib).

Pricing

Puck is open-source and available for free under its respective license.


Installation & Usage

  • Requires Intel MKL, Python 3.6+, CMake 3.21+.
  • Build instructions and demos provided in the repository.
  • Tools and configs are included for formatting datasets, training, building indices, and performing searches.

Additional Resources

Information

PublisherFox
Websitegithub.com
PublishedJun 7, 2025

Categories

1 item