IntelLabs's Vector Search Datasets
A collection of datasets curated by Intel Labs for evaluating and benchmarking vector search algorithms and databases.
Features
- Provides code to generate several datasets for similarity search benchmarking and evaluation.
- Datasets are based on high-dimensional vectors from recent deep learning models.
- Includes multiple datasets (see respective folders:
dpr
, openimages
, rqa
, text
, wit
).
- Each dataset comes with its own README file for details and usage instructions.
- Useful for researchers and developers working on vector search, similarity search, and related benchmarking tasks.
Notes
- Project Status: Not under active management. Intel has ceased development, maintenance, and contributions to this project.
- Users interested in further development or maintenance are encouraged to fork the repository.
Source
https://github.com/IntelLabs/VectorSearchDatasets
Tags
datasets
, vector-search
, benchmark
, evaluation
Category
Curated Resource Lists