FusionANNS

An efficient CPU/GPU cooperative processing architecture for billion-scale approximate nearest neighbor search. FusionANNS achieves up to 13.1× higher QPS compared to SPANN and can handle billion-vector datasets with over 12,000 QPS while maintaining 15ms latency using only one entry-level GPU.

Visit Website

Surveys

Loading more......

Information

Websitearxiv.org

PublishedApr 22, 2026

Tags

5 Items

#gpu-acceleration #cpu #hybrid #high-performance #scalable

Similar Products

RUMMY

GPU-accelerated vector query processing system using CUDA to handle datasets larger than GPU memory via reordered pipelining and cluster-based retrofitting. Supports batch queries with up to 135x speedup over traditional GPU methods and 23x vs CPU-only for large-scale similarity search and MIPS.

000

BANG

BANG is a billion-scale approximate nearest neighbor search system optimized for single GPU execution, enabling high-performance vector search in vector database environments at massive scale.

000

PilotANN

Memory-bounded GPU-accelerated framework for graph-based ANN vector search using CUDA and LibTorch, optimized for large-scale workloads beyond GPU memory. Features batch processing for high efficiency; outperforms CPU-only ANN in speed for similarity search in vector databases.

000

Juno — Optimizing ANNS with Sparsity-Aware Algorithm and Ray-Tracing Core Mapping

ASPLOS 2024 paper introducing Juno, a system that accelerates high-dimensional approximate nearest neighbor search using sparsity-aware algorithms and GPU ray-tracing (RT) core mapping for hardware-level computation acceleration.

000

Breaking the Storage-Compute Bottleneck in Billion-Scale ANNS

A 2025 research paper presenting a GPU-driven asynchronous I/O framework for billion-scale approximate nearest neighbor search. The system addresses the fundamental bottleneck of data movement between storage and compute in large-scale vector search.

000

NVIDIA CAGRA

NVIDIA CAGRA is a GPU-accelerated graph-based library for approximate nearest neighbor searches, optimized for high-performance vector search leveraging modern GPU parallelism. It is suitable for scenarios requiring rapid, large-scale vector retrieval.

000

Three Novel Design Components

1. Multi-Tiered Indexing

FusionANNS employs a carefully designed three-tier architecture:

SSDs: Store raw vectors

GPU HBM: Hold compressed vectors using Product Quantization (PQ)

Host Memory: Maintain only vector-IDs and navigation graph

This design avoids expensive data swapping between CPUs and GPU while maximizing the utility of each storage tier.

2. Heuristic Re-Ranking

Eliminate unnecessary I/Os and computations while guaranteeing high accuracy through intelligent query processing strategies.

3. Redundant-Aware I/O Deduplication

Further improves I/O efficiency by identifying and eliminating redundant data transfers during query processing.

Performance Metrics

Throughput:

Up to 13.1× higher QPS compared to SPANN

2-4.9× higher QPS compared to RUMMY

Over 12,000 QPS for billion-vector datasets

Latency: As low as 15 milliseconds for billion-scale queries

Cost Efficiency:

8.8× better cost efficiency vs. SPANN

6.8× better cost efficiency vs. RUMMY

Requires only one entry-level GPU

FusionANNS

Information

Categories

Tags

Similar Products

Connect with us

Stay Updated

Product

Clients

Company

Resources

FusionANNS

Information

Categories

Tags

Similar Products

Overview

Key Innovation: CPU/GPU Collaboration

Three Novel Design Components

1. Multi-Tiered Indexing

2. Heuristic Re-Ranking

3. Redundant-Aware I/O Deduplication

Performance Metrics

Hardware Requirements

Use Cases

Research Origin