puck
Puck is an open-source vector search engine designed for fast similarity search and retrieval of embedding vectors.
About this tool
puck
Source: https://github.com/baidu/puck
Category: open-sources
Tags: open-source, vector-search, similarity-search, embedding
Description
Puck is an open-source, high-performance vector search engine designed for fast similarity search and retrieval of embedding vectors. It is intended for large-scale industrial applications where memory constraints, computational resources, and database size are critical factors.
Features
- Approximate Nearest Neighbor (ANN) Search: Supports fast and efficient similarity search.
- Two Algorithms:
- Puck: Optimized for large-scale datasets, with memory efficiency and high recall-vs-latency performance. Uses a two-layered architectural design for inverted indices and multi-level quantization.
- Tinker: Designed for smaller datasets (e.g., 10M, 100M). Offers better performance than Nmslib in benchmarks but uses more memory than Puck.
- Written in C++: Provides Python 3 wrappers for integration with Python projects.
- Similarity Metrics Supported:
- Cosine similarity
- L2 (Euclidean) distance
- Inner Product (IP, with transformation to cosine distance)
- Memory Efficiency:
- Puck uses compressed vectors (after product quantization), reducing memory usage to about 1/4 of the original size by default.
- Tinker requires more memory to store similarity point relationships.
- Benchmark Results:
- Puck demonstrated top performance on multiple 1B-datasets in NeurIPS'21 competition track.
- Performance improvements since initial release (up to 70% increase).
- Flexible Configuration:
- Configurable via files for training, building, and searching.
- Supports different vector formats (.fvecs) and raw little endian storage.
- Build & Deployment:
- Requires MKL, Python 3.6+, and CMake 3.21+ for building.
- Includes demos and tools for training, building, and searching.
- Benchmark Tools:
- Includes scripts and configs for benchmarking against other ANN libraries (e.g., Faiss, Nmslib).
Pricing
Puck is open-source and available for free under its respective license.
Installation & Usage
- Requires Intel MKL, Python 3.6+, CMake 3.21+.
- Build instructions and demos provided in the repository.
- Tools and configs are included for formatting datasets, training, building indices, and performing searches.
Additional Resources
Loading more......
Information
Categories
Similar Products
6 result(s)Arroy is an open-source library for efficient similarity search and management of vector embeddings, useful in vector database systems.
KGraph is an open-source library for fast approximate nearest neighbor search in high-dimensional vector spaces, applicable to vector database solutions.
PostgreSQL supports vector indexing and similarity search via the PGVector extension, allowing relational databases to manage and retrieve vector embeddings efficiently.
Qdrant is a dedicated vector database and similarity search engine supporting advanced filtering and efficient retrieval, suitable for faceted search and retrieval-augmented generation. It offers self-hosted and cloud deployment options, making it highly relevant for vector search applications.
RediSearch is a Redis module that provides high-performance vector search and similarity search capabilities on top of Redis, enabling advanced search and retrieval features for AI and data applications.
Bleve is an open-source search library with experimental support for vector search, enabling hybrid search and retrieval in applications.