• Home
  • Categories
  • Tags
  • Pricing
  • Submit
    Decorative pattern
    1. Home
    2. Research Papers & Surveys
    3. FusionANNS

    FusionANNS

    An efficient CPU/GPU cooperative processing architecture for billion-scale approximate nearest neighbor search. FusionANNS achieves up to 13.1× higher QPS compared to SPANN and can handle billion-vector datasets with over 12,000 QPS while maintaining 15ms latency using only one entry-level GPU.

    🌐Visit Website

    About this tool

    Overview

    FusionANNS is a high-throughput, low-latency, cost-efficient, and high-accuracy ANNS system designed for billion-scale datasets. The system uniquely combines CPU and GPU resources along with SSD storage to achieve exceptional performance with minimal hardware requirements.

    Key Innovation: CPU/GPU Collaboration

    The core innovation lies in CPU/GPU collaborative filtering and re-ranking mechanisms, which significantly reduce I/O operations across CPUs, GPU, and SSDs to break through the I/O performance bottleneck that typically limits billion-scale vector search.

    Three Novel Design Components

    1. Multi-Tiered Indexing

    FusionANNS employs a carefully designed three-tier architecture:

    • SSDs: Store raw vectors
    • GPU HBM: Hold compressed vectors using Product Quantization (PQ)
    • Host Memory: Maintain only vector-IDs and navigation graph

    This design avoids expensive data swapping between CPUs and GPU while maximizing the utility of each storage tier.

    2. Heuristic Re-Ranking

    Eliminate unnecessary I/Os and computations while guaranteeing high accuracy through intelligent query processing strategies.

    3. Redundant-Aware I/O Deduplication

    Further improves I/O efficiency by identifying and eliminating redundant data transfers during query processing.

    Performance Metrics

    Throughput:

    • Up to 13.1× higher QPS compared to SPANN
    • 2-4.9× higher QPS compared to RUMMY
    • Over 12,000 QPS for billion-vector datasets

    Latency: As low as 15 milliseconds for billion-scale queries

    Cost Efficiency:

    • 8.8× better cost efficiency vs. SPANN
    • 6.8× better cost efficiency vs. RUMMY
    • Requires only one entry-level GPU

    Hardware Requirements

    Unlike systems requiring expensive high-end GPUs or multiple GPUs, FusionANNS achieves exceptional performance with:

    • One entry-level GPU
    • Standard CPUs
    • SSD storage

    This makes billion-scale vector search accessible to organizations with moderate hardware budgets.

    Use Cases

    • Large-scale semantic search
    • Real-time recommendation systems
    • High-throughput RAG applications
    • Production deployments requiring cost-effective billion-scale search

    Research Origin

    Developed by researchers from Huazhong University of Science and Technology and Huawei Technologies Co., Ltd., published September 2024 (arXiv:2409.16576).

    Surveys

    Loading more......

    Information

    Websitearxiv.org
    PublishedMar 20, 2026

    Categories

    1 Item
    Research Papers & Surveys

    Tags

    5 Items
    #Gpu Acceleration#Cpu#Hybrid#High Performance#Scalable

    Similar Products

    6 result(s)
    RUMMY

    A GPU-accelerated vector query processing system that supports large vector datasets beyond GPU memory. RUMMY uses reordered pipelining to efficiently overlap data transmission and GPU computation, achieving up to 135× better performance than traditional GPU-based approaches.

    Breaking the Storage-Compute Bottleneck in Billion-Scale ANNS

    A 2025 research paper presenting a GPU-driven asynchronous I/O framework for billion-scale approximate nearest neighbor search. The system addresses the fundamental bottleneck of data movement between storage and compute in large-scale vector search.

    BANG

    BANG is a billion-scale approximate nearest neighbor search system optimized for single GPU execution, enabling high-performance vector search in vector database environments at massive scale.

    PilotANN

    PilotANN is a memory-bounded GPU-accelerated framework for large-scale vector search, designed to improve performance and efficiency of approximate nearest neighbor (ANN) search workloads, making it relevant as a high-performance engine/component in vector database and vector search systems.

    cuVS

    cuVS is an open-source library from RAPIDS for fast, GPU-accelerated vector search, useful for building high-performance vector databases.

    NVIDIA CAGRA

    NVIDIA CAGRA is a GPU-accelerated graph-based library for approximate nearest neighbor searches, optimized for high-performance vector search leveraging modern GPU parallelism. It is suitable for scenarios requiring rapid, large-scale vector retrieval.

    Decorative pattern
    Built with
    Ever Works
    Ever Works

    Connect with us

    Stay Updated

    Get the latest updates and exclusive content delivered to your inbox.

    Product

    • Categories
    • Tags
    • Pricing
    • Help

    Clients

    • Sign In
    • Register
    • Forgot password?

    Company

    • About Us
    • Admin
    • Sitemap

    Resources

    • Blog
    • Submit
    • API Documentation
    All product names, logos, and brands are the property of their respective owners. All company, product, and service names used in this repository, related repositories, and associated websites are for identification purposes only. The use of these names, logos, and brands does not imply endorsement, affiliation, or sponsorship. This directory may include content generated by artificial intelligence.
    Copyright © 2025 Awesome Vector Databases. All rights reserved.·Terms of Service·Privacy Policy·Cookies