Annoy

An open-source library for approximate nearest neighbor search in high-dimensional spaces, often used as a backend for vector databases and search engines.

About this tool

Annoy

Annoy (Approximate Nearest Neighbors Oh Yeah) is an open-source C++ library with Python bindings for performing approximate nearest neighbor search in high-dimensional spaces. It is optimized for memory usage and supports efficient loading and saving of indexes to disk, enabling large-scale vector search.

Features

  • Multiple Distance Metrics: Supports Euclidean, Manhattan, Cosine, Hamming, and Dot (Inner) Product distances.
  • Efficient Memory Usage: Indexes are designed to be small and can be memory-mapped, allowing multiple processes to share the same data.
  • Index Persistence: Can create large, read-only file-based data structures that are mmapped into memory, enabling fast loading and sharing.
  • Decoupled Index Creation and Loading: Indexes can be built once, saved as files, and loaded quickly by many processes.
  • On-Disk Index Building: Can build indexes directly on disk, making it possible to handle datasets larger than RAM.
  • Cross-Language Bindings: Native C++ library with Python bindings. Additional community bindings for Go, Lua, Java, Scala, Ruby, Rust, .NET, JVM, and Dart (read-only for some languages).
  • Threaded Index Building: Supports building indexes using multiple CPU cores.
  • Customizable Parameters: Number of trees (n_trees) and nodes to inspect during searching (search_k) can be tuned for trade-offs between speed and accuracy.
  • Static Indexes: Once built, indexes are read-only (no adding of items after build), promoting immutability and efficiency.
  • Prefaulting Support: Optionally pre-reads the entire file into memory for faster subsequent searches.
  • Python API:
    • AnnoyIndex(f, metric): Create index for vectors of dimension f with specified metric.
    • add_item(i, v): Add item with integer id i and vector v.
    • build(n_trees, n_jobs=-1): Build index with specified number of trees and jobs.
    • save(fn), load(fn), unload(): Save/load/unload index from disk.
    • get_nns_by_item(i, n, ...), get_nns_by_vector(v, n, ...): Retrieve nearest neighbors.
    • get_item_vector(i), get_distance(i, j), get_n_items(), get_n_trees(): Various utility functions.
    • on_disk_build(fn): Build index directly on disk.
    • set_seed(seed): Set random seed for reproducible builds.
  • C++ API: Similar to Python, via #include "annoylib.h".
  • Cross-Platform Support: Works on Linux, OS X, Windows.
  • Community and Ecosystem: Widely used in production (e.g., by Spotify for music recommendations), with a large and active community.
  • Open Source: Licensed under Apache-2.0.

Pricing

Annoy is open-source software and is free to use under the Apache-2.0 license.

Resources

Information

PublisherFox
Websitegithub.com
PublishedMay 13, 2025

Category

1 item