



A 2025 research paper presenting a GPU-driven asynchronous I/O framework for billion-scale approximate nearest neighbor search. The system addresses the fundamental bottleneck of data movement between storage and compute in large-scale vector search.
Published in July 2025 (arXiv:2507.10070), this paper presents a GPU-driven asynchronous I/O framework that breaks through the storage-compute bottleneck limiting billion-scale vector search systems.
For billion-scale datasets exceeding GPU memory:
The key innovation is overlapping I/O and computation:
Unlike CPU-managed I/O:
Algorithms to predict which data will be needed next based on graph traversal patterns
Methods to maximize overlap between I/O and computation phases
Strategies for efficiently managing limited GPU memory as a cache for SSD data
As vector datasets grow, the storage-compute interface becomes critical. This research provides practical techniques for efficiently bridging SSD storage and GPU computation—essential for making billion-scale search economical.
ArXiv preprint arXiv:2507.10070 (2025) with detailed algorithms and experimental results.
Loading more......