• Home
  • Categories
  • Tags
  • Pricing
  • Submit
    1. Home
    2. Sdks & Libraries
    3. PyNNDescent

    PyNNDescent

    Python implementation of Nearest Neighbor Descent for k-neighbor-graph construction and ANN search. Targets 80%-100% accuracy with fast performance and supports wide variety of distance metrics. This is an OSS library.

    🌐Visit Website

    About this tool

    Overview

    PyNNDescent provides a Python implementation of Nearest Neighbor Descent for k-neighbor-graph construction and approximate nearest neighbor search. Based on a 2011 ACM paper focusing on high-accuracy ANN searches.

    Key Features

    • Fast Performance: Among the fastest ANN libraries
    • Easy Installation: pip and conda installable, no platform issues
    • Flexible: Supports wide variety of distance metrics
    • High Accuracy: Targets 80%-100% accuracy rate
    • Scikit-learn Integration: Provides KNeighborTransformer support
    • Pure Python: No compilation required

    Performance Characteristics

    • Performs solidly in ann-benchmarks top performing libraries
    • Fast approximate nearest neighbor queries
    • Efficient k-neighbor-graph construction
    • Good accuracy/speed trade-off

    Technical Approach

    Nearest Neighbor Descent

    • Core algorithm from 2011 ACM paper by Dong, Wei, Charikar Moses, and Kai Li
    • Efficient graph construction
    • Iterative refinement

    Enhancements

    • Random Projection Trees: Used for initialization
    • Graph Diversification: Prunes longest edges of triangles
    • Optimized Search: Efficient query algorithms

    Distance Metrics

    Supports extensive list of metrics:

    • Euclidean, Manhattan, Chebyshev
    • Minkowski, Hamming, Cosine
    • Correlation, Jaccard, Dice
    • And many more specialized metrics

    Installation

    PyPI

    pip install pynndescent
    

    Conda

    conda install pynndescent
    

    Scikit-learn Integration

    • KNeighborTransformer support
    • Compatible with sklearn pipelines
    • Fits into existing ML workflows
    • Drop-in replacement for sklearn's KNN

    Use Cases

    • High-accuracy ANN search (80%+ recall)
    • K-neighbor graph construction
    • Dimensionality reduction
    • Clustering preprocessing
    • Manifold learning
    • Similarity search

    API

    Simple Python interface:

    from pynndescent import NNDescent
    
    index = NNDescent(data)
    neighbors, distances = index.query(query_data, k=10)
    

    Comparison to Alternatives

    Advantages

    • Easy installation (pure Python)
    • No platform-specific issues
    • Flexible distance metrics
    • High accuracy focus
    • Good scikit-learn integration

    Trade-offs

    • May be slower than C++-based libraries for some workloads
    • Memory usage vs. compiled alternatives
    • Best for accuracy-focused applications

    Documentation

    • Comprehensive ReadTheDocs documentation
    • Example notebooks on GitHub
    • API reference
    • Tutorial materials

    Community

    • GitHub: lmcinnes/pynndescent
    • Active maintenance
    • Responsive to issues
    • Regular updates

    License

    2-clause BSD licensed - permissive open-source license

    Related Projects

    From the same author:

    • UMAP (dimensionality reduction)
    • HDBSCAN (clustering)

    Pricing

    Free and open-source under BSD license. No licensing costs.

    Surveys

    Loading more......

    Information

    Websitegithub.com
    PublishedMar 6, 2026

    Categories

    1 Item
    Sdks & Libraries

    Tags

    3 Items
    #Open Source
    #Python
    #Ann

    Similar Products

    6 result(s)
    VectorDB

    Lightweight Python package for storing and retrieving text using chunking, embeddings, and vector search. Powers AI features in Kagi Search with low latency and small memory footprint. This is an OSS library.

    Voyager

    Voyager is a Spotify open-source vector search library and service for efficient nearest neighbor search on large-scale vector datasets.

    Gensim

    Gensim is a Python library for topic modeling and vector space modeling, providing tools to generate high-dimensional vector embeddings from text data. These embeddings can be stored and efficiently searched in vector databases, making Gensim directly relevant to vector search use cases.

    spaCy

    spaCy is an industrial-strength NLP library in Python that provides advanced tools for generating word, sentence, and document embeddings. These embeddings are commonly stored and searched in vector databases for NLP and semantic search applications.

    Word2vec

    Word2vec is a popular machine learning technique for generating vector embeddings based on the distributional properties of words in large corpora. It is directly relevant to vector databases as it produces the high-dimensional vector representations stored and indexed by these databases for vector search and similarity tasks.

    Annoy

    An open-source library for approximate nearest neighbor search in high-dimensional spaces, often used as a backend for vector databases and search engines.

    Built with
    Ever Works
    Ever Works

    Connect with us

    Stay Updated

    Get the latest updates and exclusive content delivered to your inbox.

    Product

    • Categories
    • Tags
    • Pricing
    • Help

    Clients

    • Sign In
    • Register
    • Forgot password?

    Company

    • About Us
    • Admin
    • Sitemap

    Resources

    • Blog
    • Submit
    • API Documentation
    All product names, logos, and brands are the property of their respective owners. All company, product, and service names used in this repository, related repositories, and associated websites are for identification purposes only. The use of these names, logos, and brands does not imply endorsement, affiliation, or sponsorship. This directory may include content generated by artificial intelligence.
    Copyright © 2025 Awesome Vector Databases. All rights reserved.·Terms of Service·Privacy Policy·Cookies