IntelLabs's Vector Search Datasets

A collection of datasets curated by Intel Labs specifically for evaluating and benchmarking vector search algorithms and databases.

About this tool

IntelLabs's Vector Search Datasets

A collection of datasets curated by Intel Labs for evaluating and benchmarking vector search algorithms and databases.

Features

  • Provides code to generate several datasets for similarity search benchmarking and evaluation.
  • Datasets are based on high-dimensional vectors from recent deep learning models.
  • Includes multiple datasets (see respective folders: dpr, openimages, rqa, text, wit).
  • Each dataset comes with its own README file for details and usage instructions.
  • Useful for researchers and developers working on vector search, similarity search, and related benchmarking tasks.

Notes

  • Project Status: Not under active management. Intel has ceased development, maintenance, and contributions to this project.
  • Users interested in further development or maintenance are encouraged to fork the repository.

Source

https://github.com/IntelLabs/VectorSearchDatasets

Tags

datasets, vector-search, benchmark, evaluation

Category

Curated Resource Lists

Information

PublisherFox
Websitegithub.com
PublishedMay 13, 2025