IntelLabs's Vector Search Datasets
A collection of datasets curated by Intel Labs specifically for evaluating and benchmarking vector search algorithms and databases.
About this tool
IntelLabs's Vector Search Datasets
A collection of datasets curated by Intel Labs for evaluating and benchmarking vector search algorithms and databases.
Features
- Provides code to generate several datasets for similarity search benchmarking and evaluation.
- Datasets are based on high-dimensional vectors from recent deep learning models.
- Includes multiple datasets (see respective folders:
dpr
,openimages
,rqa
,text
,wit
). - Each dataset comes with its own README file for details and usage instructions.
- Useful for researchers and developers working on vector search, similarity search, and related benchmarking tasks.
Notes
- Project Status: Not under active management. Intel has ceased development, maintenance, and contributions to this project.
- Users interested in further development or maintenance are encouraged to fork the repository.
Source
https://github.com/IntelLabs/VectorSearchDatasets
Tags
datasets
, vector-search
, benchmark
, evaluation
Category
Curated Resource Lists