



An efficient CPU/GPU cooperative processing architecture for billion-scale approximate nearest neighbor search. FusionANNS achieves up to 13.1× higher QPS compared to SPANN and can handle billion-vector datasets with over 12,000 QPS while maintaining 15ms latency using only one entry-level GPU.
FusionANNS is a high-throughput, low-latency, cost-efficient, and high-accuracy ANNS system designed for billion-scale datasets. The system uniquely combines CPU and GPU resources along with SSD storage to achieve exceptional performance with minimal hardware requirements.
The core innovation lies in CPU/GPU collaborative filtering and re-ranking mechanisms, which significantly reduce I/O operations across CPUs, GPU, and SSDs to break through the I/O performance bottleneck that typically limits billion-scale vector search.
FusionANNS employs a carefully designed three-tier architecture:
This design avoids expensive data swapping between CPUs and GPU while maximizing the utility of each storage tier.
Eliminate unnecessary I/Os and computations while guaranteeing high accuracy through intelligent query processing strategies.
Further improves I/O efficiency by identifying and eliminating redundant data transfers during query processing.
Throughput:
Latency: As low as 15 milliseconds for billion-scale queries
Cost Efficiency:
Unlike systems requiring expensive high-end GPUs or multiple GPUs, FusionANNS achieves exceptional performance with:
This makes billion-scale vector search accessible to organizations with moderate hardware budgets.
Developed by researchers from Huazhong University of Science and Technology and Huawei Technologies Co., Ltd., published September 2024 (arXiv:2409.16576).
Loading more......