RUMMY

GPU-accelerated vector query processing system using CUDA to handle datasets larger than GPU memory via reordered pipelining and cluster-based retrofitting. Supports batch queries with up to 135x speedup over traditional GPU methods and 23x vs CPU-only for large-scale similarity search and MIPS.

Visit Website

Surveys

Loading more......

Information

Websitegithub.com

PublishedApr 23, 2026

Tags

4 Items

#gpu-acceleration #cuda #high-performance #scalable

Similar Products

PilotANN

Memory-bounded GPU-accelerated framework for graph-based ANN vector search using CUDA and LibTorch, optimized for large-scale workloads beyond GPU memory. Features batch processing for high efficiency; outperforms CPU-only ANN in speed for similarity search in vector databases.

000

FusionANNS

An efficient CPU/GPU cooperative processing architecture for billion-scale approximate nearest neighbor search. FusionANNS achieves up to 13.1× higher QPS compared to SPANN and can handle billion-vector datasets with over 12,000 QPS while maintaining 15ms latency using only one entry-level GPU.

000

NVIDIA cuVS

NVIDIA cuVS is a GPU-accelerated approximate nearest neighbor search library utilizing CUDA for high-performance CAGRA, HNSW, IVF-PQ indexes on billion-scale datasets. Supports batch queries for high-throughput operations, ideal for large-scale similarity search and real-time recommendations. Delivers up to 12x faster index building and 8x lower query latency compared to CPU-only implementations like Milvus.

000

cuVS

NVIDIA RAPIDS cuVS is a GPU-accelerated library for vector search and clustering with CUDA-optimized HNSW, IVF, CAGRA, and PQ implementations. Supports batch queries for high QPS, suited for large-scale similarity search in GenAI apps. Achieves up to 12x faster indexing and lower latency vs CPU-only alternatives like FAISS CPU.

000

Juno — Optimizing ANNS with Sparsity-Aware Algorithm and Ray-Tracing Core Mapping

ASPLOS 2024 paper introducing Juno, a system that accelerates high-dimensional approximate nearest neighbor search using sparsity-aware algorithms and GPU ray-tracing (RT) core mapping for hardware-level computation acceleration.

000

Breaking the Storage-Compute Bottleneck in Billion-Scale ANNS

A 2025 research paper presenting a GPU-driven asynchronous I/O framework for billion-scale approximate nearest neighbor search. The system addresses the fundamental bottleneck of data movement between storage and compute in large-scale vector search.

000

RUMMY

Information

Categories

Tags

Similar Products

Connect with us

Stay Updated

Product

Clients

Company

Resources

RUMMY

Information

Categories

Tags

Similar Products

Overview

Key Features

Performance

Use Cases

Technical Architecture

Availability