• Home
  • Categories
  • Tags
  • Pricing
  • Submit
    Decorative pattern
    1. Home
    2. Research Papers & Surveys
    3. Breaking the Storage-Compute Bottleneck in Billion-Scale ANNS

    Breaking the Storage-Compute Bottleneck in Billion-Scale ANNS

    A 2025 research paper presenting a GPU-driven asynchronous I/O framework for billion-scale approximate nearest neighbor search. The system addresses the fundamental bottleneck of data movement between storage and compute in large-scale vector search.

    🌐Visit Website

    About this tool

    Overview

    Published in July 2025 (arXiv:2507.10070), this paper presents a GPU-driven asynchronous I/O framework that breaks through the storage-compute bottleneck limiting billion-scale vector search systems.

    The Bottleneck Problem

    For billion-scale datasets exceeding GPU memory:

    • Data must be loaded from storage (SSD) during search
    • I/O bandwidth becomes the limiting factor
    • GPU compute sits idle waiting for data
    • Traditional synchronous I/O wastes resources

    Asynchronous I/O Framework

    The key innovation is overlapping I/O and computation:

    1. Prefetch Next Data: While GPU processes current batch, asynchronously load next batch
    2. Pipeline Execution: Continuous stream of data to GPU
    3. Minimize Idle Time: Keep GPU utilized while I/O happens in background
    4. Adaptive Scheduling: Adjust prefetch based on query patterns

    GPU-Driven Design

    Unlike CPU-managed I/O:

    • GPU directly controls data movement
    • Reduces CPU bottleneck
    • Lower latency for I/O decisions
    • Better alignment with GPU compute patterns

    Performance Benefits

    • Throughput: Maximizes GPU utilization by eliminating I/O wait time
    • Latency: Reduces query latency through intelligent prefetching
    • Scalability: Enables billion-scale search with single GPU
    • Efficiency: Better resource utilization vs. synchronous approaches

    Technical Contributions

    Smart Prefetching

    Algorithms to predict which data will be needed next based on graph traversal patterns

    Overlap Optimization

    Methods to maximize overlap between I/O and computation phases

    Memory Management

    Strategies for efficiently managing limited GPU memory as a cache for SSD data

    Use Cases

    • Billion-scale semantic search on single GPU
    • Cost-effective large-scale deployments
    • Systems where dataset >> GPU memory
    • Applications requiring both scale and speed

    Significance

    As vector datasets grow, the storage-compute interface becomes critical. This research provides practical techniques for efficiently bridging SSD storage and GPU computation—essential for making billion-scale search economical.

    Availability

    ArXiv preprint arXiv:2507.10070 (2025) with detailed algorithms and experimental results.

    Surveys

    Loading more......

    Information

    Websitearxiv.org
    PublishedMar 20, 2026

    Categories

    1 Item
    Research Papers & Surveys

    Tags

    4 Items
    #Gpu Acceleration#storage#Algorithms#Scalable

    Similar Products

    6 result(s)
    FusionANNS

    An efficient CPU/GPU cooperative processing architecture for billion-scale approximate nearest neighbor search. FusionANNS achieves up to 13.1× higher QPS compared to SPANN and can handle billion-vector datasets with over 12,000 QPS while maintaining 15ms latency using only one entry-level GPU.

    OrchANN

    A unified I/O orchestration framework for skewed out-of-core vector search that addresses the challenge of billion-scale ANN search when the dataset exceeds available memory. OrchANN optimizes I/O operations for graph-based indexes stored on disk.

    Scalable Distributed Vector Search

    A research paper on accuracy-preserving index construction for distributed vector search systems. Published in 2025, it addresses the challenge of maintaining search quality while distributing vector indexes across multiple nodes.

    RUMMY

    A GPU-accelerated vector query processing system that supports large vector datasets beyond GPU memory. RUMMY uses reordered pipelining to efficiently overlap data transmission and GPU computation, achieving up to 135× better performance than traditional GPU-based approaches.

    Amazon S3 Vector Search

    Leveraging Amazon S3 as a storage layer for vector databases, enabling 70-95% cost reduction for certain use cases. S3's low storage costs make it attractive for large-scale vector datasets with appropriate access patterns.

    Milvus

    Milvus is a mature, open-source vector database maintained by Zilliz, supporting large-scale similarity search with multiple indexing strategies and GPU acceleration. It includes variants such as Milvus Lite (lightweight version), Milvus Standalone (single-machine deployment), and Milvus Distributed (Kubernetes-based deployment for large scale).

    Decorative pattern
    Built with
    Ever Works
    Ever Works

    Connect with us

    Stay Updated

    Get the latest updates and exclusive content delivered to your inbox.

    Product

    • Categories
    • Tags
    • Pricing
    • Help

    Clients

    • Sign In
    • Register
    • Forgot password?

    Company

    • About Us
    • Admin
    • Sitemap

    Resources

    • Blog
    • Submit
    • API Documentation
    All product names, logos, and brands are the property of their respective owners. All company, product, and service names used in this repository, related repositories, and associated websites are for identification purposes only. The use of these names, logos, and brands does not imply endorsement, affiliation, or sponsorship. This directory may include content generated by artificial intelligence.
    Copyright © 2025 Awesome Vector Databases. All rights reserved.·Terms of Service·Privacy Policy·Cookies