Annoy
An open-source library for approximate nearest neighbor search in high-dimensional spaces, often used as a backend for vector databases and search engines.
About this tool
Annoy
Annoy (Approximate Nearest Neighbors Oh Yeah) is an open-source C++ library with Python bindings for performing approximate nearest neighbor search in high-dimensional spaces. It is optimized for memory usage and supports efficient loading and saving of indexes to disk, enabling large-scale vector search.
Features
- Multiple Distance Metrics: Supports Euclidean, Manhattan, Cosine, Hamming, and Dot (Inner) Product distances.
- Efficient Memory Usage: Indexes are designed to be small and can be memory-mapped, allowing multiple processes to share the same data.
- Index Persistence: Can create large, read-only file-based data structures that are mmapped into memory, enabling fast loading and sharing.
- Decoupled Index Creation and Loading: Indexes can be built once, saved as files, and loaded quickly by many processes.
- On-Disk Index Building: Can build indexes directly on disk, making it possible to handle datasets larger than RAM.
- Cross-Language Bindings: Native C++ library with Python bindings. Additional community bindings for Go, Lua, Java, Scala, Ruby, Rust, .NET, JVM, and Dart (read-only for some languages).
- Threaded Index Building: Supports building indexes using multiple CPU cores.
- Customizable Parameters: Number of trees (
n_trees
) and nodes to inspect during searching (search_k
) can be tuned for trade-offs between speed and accuracy. - Static Indexes: Once built, indexes are read-only (no adding of items after build), promoting immutability and efficiency.
- Prefaulting Support: Optionally pre-reads the entire file into memory for faster subsequent searches.
- Python API:
AnnoyIndex(f, metric)
: Create index for vectors of dimensionf
with specified metric.add_item(i, v)
: Add item with integer idi
and vectorv
.build(n_trees, n_jobs=-1)
: Build index with specified number of trees and jobs.save(fn)
,load(fn)
,unload()
: Save/load/unload index from disk.get_nns_by_item(i, n, ...)
,get_nns_by_vector(v, n, ...)
: Retrieve nearest neighbors.get_item_vector(i)
,get_distance(i, j)
,get_n_items()
,get_n_trees()
: Various utility functions.on_disk_build(fn)
: Build index directly on disk.set_seed(seed)
: Set random seed for reproducible builds.
- C++ API: Similar to Python, via
#include "annoylib.h"
. - Cross-Platform Support: Works on Linux, OS X, Windows.
- Community and Ecosystem: Widely used in production (e.g., by Spotify for music recommendations), with a large and active community.
- Open Source: Licensed under Apache-2.0.
Pricing
Annoy is open-source software and is free to use under the Apache-2.0 license.