• Home
  • Categories
  • Pricing
  • Submit
    Built with
    Ever Works
    Ever Works

    Connect with us

    Stay Updated

    Get the latest updates and exclusive content delivered to your inbox.

    Product

    • Categories
    • Pricing
    • Help

    Clients

    • Sign In
    • Register
    • Forgot password?

    Company

    • About Us
    • Admin
    • Sitemap

    Resources

    • Blog
    • Submit
    • API Documentation
    All product names, logos, and brands are the property of their respective owners. All company, product, and service names used in this repository, related repositories, and associated websites are for identification purposes only. The use of these names, logos, and brands does not imply endorsement, affiliation, or sponsorship. This directory may include content generated by artificial intelligence.
    Copyright © 2025 Awesome Vector Databases. All rights reserved.·Terms of Service·Privacy Policy·Cookies
    Decorative pattern
    Decorative pattern
    1. Home
    2. Benchmarks & Evaluation
    3. ToolSearch Dataset

    ToolSearch Dataset

    Benchmark dataset for evaluating tool retrieval systems in AI Agent applications. Provides test cases for assessing how well systems can select the most relevant tools from large tool repositories based on conversational context and task objectives.

    ToolSearch Dataset

    Benchmark dataset for evaluating tool retrieval systems in AI Agent applications. Provides test cases for assessing how well systems can select the most relevant tools from large tool repositories based on conversational context and task objectives.

    https://huggingface.co/datasets/bowang0911/ToolSearch

    Surveys

    Loading more......

    Information

    Websitehuggingface.co
    PublishedApr 4, 2026

    Categories

    1 Item
    Benchmarks & Evaluation

    Tags

    3 Items
    #tool-retrieval#agent#benchmark

    Similar Products

    6 result(s)

    MTEB Leaderboard

    Massive Text Embedding Benchmark leaderboard covering 58 datasets across 112 languages and 8 embedding tasks. Industry-standard benchmark for comparing text embedding models.

    Featured

    LongMemEval

    Comprehensive benchmark for evaluating long-term memory in chat assistants with 500 manual questions testing information extraction, multi-session reasoning, and temporal reasoning across 115K-1.5M tokens.

    BEIR Benchmark

    A heterogeneous benchmark for evaluating information retrieval models across 18 diverse datasets and 9 different retrieval tasks. BEIR (Benchmarking IR) measures zero-shot retrieval performance, testing how well models generalize without task-specific fine-tuning, making it a standard evaluation for embedding models and retrieval systems.

    MTEB

    Massive Text Embedding Benchmark (MTEB) - a comprehensive benchmark for evaluating text embedding models across 8 embedding tasks and 58 datasets in 112 languages. Provides a standardized leaderboard for comparing embedding quality across classification, clustering, retrieval, reranking, semantic textual similarity, and summarization tasks.

    BigVectorBench

    An innovative benchmark suite for thoroughly evaluating vector database performance on heterogeneous data embeddings and compound queries for real-world multimodal applications.

    MTEB (Massive Text Embedding Benchmark)

    Comprehensive benchmark suite for evaluating embedding models across 58 datasets spanning 112 languages and eight task types including retrieval, clustering, and semantic similarity, the standard for comparing embedding quality.