IntelLabs's Vector Search Datasets

A collection of datasets curated by Intel Labs specifically for evaluating and benchmarking vector search algorithms and databases.

🌐Visit Website

About this tool

IntelLabs's Vector Search Datasets

A collection of datasets curated by Intel Labs for evaluating and benchmarking vector search algorithms and databases.

Features

Provides code to generate several datasets for similarity search benchmarking and evaluation.
Datasets are based on high-dimensional vectors from recent deep learning models.
Includes multiple datasets (see respective folders: dpr, openimages, rqa, text, wit).
Each dataset comes with its own README file for details and usage instructions.
Useful for researchers and developers working on vector search, similarity search, and related benchmarking tasks.

Notes

Project Status: Not under active management. Intel has ceased development, maintenance, and contributions to this project.
Users interested in further development or maintenance are encouraged to fork the repository.

Source

https://github.com/IntelLabs/VectorSearchDatasets

Tags

datasets, vector-search, benchmark, evaluation

Category

Curated Resource Lists

Information

PublisherFox

Websitegithub.com

PublishedMay 13, 2025

Categories

1 item

Curated Resource Lists

Tags

4 items