• Home
  • Categories
  • Tags
  • Pricing
  • Submit
    Decorative pattern
    1. Home
    2. Benchmarks & Evaluation
    3. Deep1B Dataset

    Deep1B Dataset

    Billion-scale benchmark dataset containing 96-dimensional deep learning image embeddings. Provides real-world proxy for testing distributed systems and GPU-accelerated vector search at scale.

    🌐Visit Website

    About this tool

    Overview

    Deep1B (DEEP1B) is a collection of one billion image embeddings compressed to 96 dimensions. This dataset provides a large-scale, real-world proxy for testing distributed systems and GPU-accelerated search.

    Dataset Characteristics

    • Size: 1 billion vectors
    • Dimensions: 96-dimensional compressed embeddings
    • Source: Deep learning image features
    • Type: Neural network-generated embeddings

    Key Features

    • Compressed dimensionality (96D) compared to raw SIFT features
    • Reflects modern deep learning embedding characteristics
    • Suitable for testing neural embedding search systems
    • Real-world distribution of learned features

    Significance

    Deep1B is particularly valuable for:

    • Evaluating modern embedding-based search systems
    • Testing distributed vector databases
    • Benchmarking GPU-accelerated vector search
    • Assessing performance with neural embeddings vs. hand-crafted features

    Comparison with SIFT1B

    While SIFT1B uses hand-crafted 128D features, Deep1B uses learned 96D embeddings, making it more representative of modern AI applications that use neural network embeddings.

    Dataset Access

    Available for download from http://corpus-texmex.irisa.fr/ Recommend using Axel for faster downloads of large-scale datasets.

    Applications

    • Neural embedding search evaluation
    • Deep learning-based retrieval systems
    • Modern vector database benchmarking
    • Distributed search system testing
    Surveys

    Loading more......

    Information

    Websitecorpus-texmex.irisa.fr
    PublishedMar 8, 2026

    Categories

    1 Item
    Benchmarks & Evaluation

    Tags

    3 Items
    #Benchmark#Datasets#Deep Learning

    Similar Products

    6 result(s)
    SIFT1B Dataset

    Billion-scale benchmark dataset containing 128-dimensional SIFT descriptors of one billion images. Widely used standard for evaluating approximate nearest neighbor search algorithms at scale.

    BEIR

    BEIR (Benchmarking IR) is a benchmark suite for evaluating information retrieval and vector search systems across multiple tasks and datasets. Useful for comparing vector database performance.

    IntelLabs's Vector Search Datasets

    A collection of datasets curated by Intel Labs specifically for evaluating and benchmarking vector search algorithms and databases.

    MTEB Leaderboard
    Featured

    Massive Text Embedding Benchmark leaderboard covering 58 datasets across 112 languages and 8 embedding tasks. Industry-standard benchmark for comparing text embedding models.

    Big-ANN Benchmarks

    Billion-scale approximate nearest neighbor search benchmark competition. Features datasets like SIFT1B, Deep1B with standardized evaluation metrics for comparing vector search algorithms at scale.

    SISAP Indexing Challenge

    An annual competition focused on similarity search and indexing algorithms, including approximate nearest neighbor methods and high-dimensional vector indexing, providing benchmarks and results relevant to vector database research.

    Decorative pattern
    Built with
    Ever Works
    Ever Works

    Connect with us

    Stay Updated

    Get the latest updates and exclusive content delivered to your inbox.

    Product

    • Categories
    • Tags
    • Pricing
    • Help

    Clients

    • Sign In
    • Register
    • Forgot password?

    Company

    • About Us
    • Admin
    • Sitemap

    Resources

    • Blog
    • Submit
    • API Documentation
    All product names, logos, and brands are the property of their respective owners. All company, product, and service names used in this repository, related repositories, and associated websites are for identification purposes only. The use of these names, logos, and brands does not imply endorsement, affiliation, or sponsorship. This directory may include content generated by artificial intelligence.
    Copyright © 2025 Awesome Vector Databases. All rights reserved.·Terms of Service·Privacy Policy·Cookies