• Home
  • Categories
  • Tags
  • Pricing
  • Submit
    Decorative pattern
    1. Home
    2. Benchmarks & Evaluation
    3. SIFT1B Dataset

    SIFT1B Dataset

    Billion-scale benchmark dataset containing 128-dimensional SIFT descriptors of one billion images. Widely used standard for evaluating approximate nearest neighbor search algorithms at scale.

    🌐Visit Website

    About this tool

    Overview

    SIFT1B (also known as BigANN or ANN_SIFT1B) represents the 128-dimensional SIFT (Scale-Invariant Feature Transform) descriptors of one billion images. Released in September 2010, it remains a fundamental benchmark for large-scale vector search evaluation.

    Dataset Characteristics

    • Size: 1 billion vectors
    • Dimensions: 128-dimensional SIFT descriptors
    • Source: Image feature descriptors
    • Format: High-dimensional vectors suitable for similarity search

    Significance for Evaluation

    SIFT1B plays a critical role in evaluating vector search algorithms by providing:

    • Consistent, reproducible foundation for comparison
    • Billion-scale testing to stress-test ANN algorithms
    • Pre-processed data with known characteristics
    • Industry-standard benchmark for performance claims

    Related Datasets

    • SIFT1M: 1 million SIFT descriptors (smaller version for initial testing)
    • GIST1M: 960-dimensional GIST descriptors
    • Deep1B: 1 billion deep learning features

    Dataset Access

    Laurent Amsaleg (CNRS/IRISA) and Hervé Jégou (Facebook AI Research) have waived all copyright and related rights. Datasets can be downloaded from http://corpus-texmex.irisa.fr/

    For downloading BIGANN, using Axel is recommended for faster downloads.

    Usage in Research

    SIFT1B is extensively used in:

    • ANN algorithm benchmarking
    • Vector database performance evaluation
    • Scalability testing
    • Algorithm comparison studies
    Surveys

    Loading more......

    Information

    Websitecorpus-texmex.irisa.fr
    PublishedMar 8, 2026

    Categories

    1 Item
    Benchmarks & Evaluation

    Tags

    3 Items
    #Benchmark#Datasets#Ann

    Similar Products

    6 result(s)
    Big-ANN Benchmarks

    Billion-scale approximate nearest neighbor search benchmark competition. Features datasets like SIFT1B, Deep1B with standardized evaluation metrics for comparing vector search algorithms at scale.

    Deep1B Dataset

    Billion-scale benchmark dataset containing 96-dimensional deep learning image embeddings. Provides real-world proxy for testing distributed systems and GPU-accelerated vector search at scale.

    WEAVESS

    WEAVESS is an open-source benchmarking and evaluation framework for graph-based approximate nearest neighbor (ANN) search methods, providing code and experiments for large-scale vector similarity search. It is useful for researchers and practitioners comparing vector indexing algorithms for vector databases and AI search applications.

    BEIR

    BEIR (Benchmarking IR) is a benchmark suite for evaluating information retrieval and vector search systems across multiple tasks and datasets. Useful for comparing vector database performance.

    ANN-Benchmarks

    ANN-Benchmarks is a benchmarking platform specifically for evaluating the performance of approximate nearest neighbor (ANN) search algorithms, which are foundational to vector database evaluation and comparison.

    Zeng, Xianzhi, et al. "CANDY: A Benchmark for Continuous Approximate Nearest Neighbor Search with Dynamic Data Ingestion."

    A 2024 paper introducing CANDY, a benchmark for continuous ANN search with a focus on dynamic data ingestion, crucial for next-generation vector databases.

    Decorative pattern
    Built with
    Ever Works
    Ever Works

    Connect with us

    Stay Updated

    Get the latest updates and exclusive content delivered to your inbox.

    Product

    • Categories
    • Tags
    • Pricing
    • Help

    Clients

    • Sign In
    • Register
    • Forgot password?

    Company

    • About Us
    • Admin
    • Sitemap

    Resources

    • Blog
    • Submit
    • API Documentation
    All product names, logos, and brands are the property of their respective owners. All company, product, and service names used in this repository, related repositories, and associated websites are for identification purposes only. The use of these names, logos, and brands does not imply endorsement, affiliation, or sponsorship. This directory may include content generated by artificial intelligence.
    Copyright © 2025 Awesome Vector Databases. All rights reserved.·Terms of Service·Privacy Policy·Cookies