



Deep1B Dataset powers vector DB perf testing as a billion-scale benchmark with 96D deep learning embeddings, used in ANN-Benchmarks and Big-ANN for QPS/latency/recall at scale. Key features include realistic neural feature distributions for scalability validation. Vital for selecting prod vector DBs handling billion-vector workloads; dataset core to benchmarks vs VectorDBBench full systems.
Loading more......
Deep1B (DEEP1B) is a collection of one billion image embeddings compressed to 96 dimensions. This dataset provides a large-scale, real-world proxy for testing distributed systems and GPU-accelerated search.
Deep1B is particularly valuable for:
While SIFT1B uses hand-crafted 128D features, Deep1B uses learned 96D embeddings, making it more representative of modern AI applications that use neural network embeddings.
Available for download from http://corpus-texmex.irisa.fr/ Recommend using Axel for faster downloads of large-scale datasets.
Free dataset.