Connect with us

Stay Updated

Get the latest updates and exclusive content delivered to your inbox.

Product

Categories
Pricing
Help

Clients

Sign In
Register
Forgot password?

Company

About Us
Admin
Sitemap

Resources

Blog
Submit
API Documentation

All product names, logos, and brands are the property of their respective owners. All company, product, and service names used in this repository, related repositories, and associated websites are for identification purposes only. The use of these names, logos, and brands does not imply endorsement, affiliation, or sponsorship. This directory may include content generated by artificial intelligence.

Copyright © 2025 Awesome Vector Databases. All rights reserved.·Terms of Service ·Privacy Policy ·Cookies

Decorative pattern

Decorative pattern

Home
Benchmarks & Evaluation
ToolSearch Dataset

ToolSearch Dataset

Benchmark dataset for evaluating tool retrieval systems in AI Agent applications. Provides test cases for assessing how well systems can select the most relevant tools from large tool repositories based on conversational context and task objectives.

ToolSearch Dataset

Benchmark dataset for evaluating tool retrieval systems in AI Agent applications. Provides test cases for assessing how well systems can select the most relevant tools from large tool repositories based on conversational context and task objectives.

https://huggingface.co/datasets/bowang0911/ToolSearch

Surveys

Loading more......

Information

Websitehuggingface.co

PublishedApr 4, 2026

Categories

1 Item

Benchmarks & Evaluation

Tags

3 Items

#tool-retrieval #agent #benchmark

Similar Products

6 result(s)

MTEB Leaderboard

Massive Text Embedding Benchmark leaderboard covering 58 datasets across 112 languages and 8 embedding tasks. Industry-standard benchmark for comparing text embedding models.

LongMemEval

Comprehensive benchmark for evaluating long-term memory in chat assistants with 500 manual questions testing information extraction, multi-session reasoning, and temporal reasoning across 115K-1.5M tokens.

BEIR Benchmark

A heterogeneous benchmark for evaluating information retrieval models across 18 diverse datasets and 9 different retrieval tasks. BEIR (Benchmarking IR) measures zero-shot retrieval performance, testing how well models generalize without task-specific fine-tuning, making it a standard evaluation for embedding models and retrieval systems.

MTEB

Massive Text Embedding Benchmark (MTEB) - a comprehensive benchmark for evaluating text embedding models across 8 embedding tasks and 58 datasets in 112 languages. Provides a standardized leaderboard for comparing embedding quality across classification, clustering, retrieval, reranking, semantic textual similarity, and summarization tasks.

BigVectorBench

An innovative benchmark suite for thoroughly evaluating vector database performance on heterogeneous data embeddings and compound queries for real-world multimodal applications.

MTEB (Massive Text Embedding Benchmark)

Comprehensive benchmark suite for evaluating embedding models across 58 datasets spanning 112 languages and eight task types including retrieval, clustering, and semantic similarity, the standard for comparing embedding quality.