



GPU-accelerated vector query processing system using CUDA to handle datasets larger than GPU memory via reordered pipelining and cluster-based retrofitting. Supports batch queries with up to 135x speedup over traditional GPU methods and 23x vs CPU-only for large-scale similarity search and MIPS.
Loading more......
RUMMY is the first GPU-accelerated vector query processing system that achieves high performance and supports large vector datasets beyond GPU memory. Developed by researchers from Peking University, the system was presented at USENIX NSDI '24.
RUMMY addresses the challenge of processing vector queries on datasets that exceed GPU memory capacity. The core innovation is a novel reordered pipelining technique that leverages three key ideas to achieve optimal performance with limited GPU memory.
RUMMY is open-source and available on GitHub at pkusys/Rummy.