• Home
  • Categories
  • Tags
  • Pricing
  • Submit
    Decorative pattern
    1. Home
    2. Benchmarks & Evaluation
    3. LongMemEval

    LongMemEval

    Comprehensive benchmark for evaluating long-term memory in chat assistants with 500 manual questions testing information extraction, multi-session reasoning, and temporal reasoning across 115K-1.5M tokens.

    🌐Visit Website

    About this tool

    Overview

    LongMemEval is a comprehensive benchmark designed to evaluate five core long-term memory abilities of chat assistants: information extraction, multi-session reasoning, temporal reasoning, knowledge updates, and abstention. The benchmark consists of 500 manually created questions and was accepted to ICLR 2025.

    Benchmark Characteristics

    The benchmark simulates 115,000-token (LongMemEval_S) and up to 1.5 million-token (LongMemEval_M) settings with multi-session, multi-turn interactions and realistic distractors. LongMemEval presents a significant challenge to existing systems, with commercial chat assistants and long-context LLMs showing a 30% accuracy drop on memorizing information across sustained interactions.

    Recent 2026 Performance Breakthroughs

    Several organizations have achieved remarkable results on this benchmark in early 2026:

    1. Supermemory (March 2026): Supermemory achieved ~99% on LongMemEval_s using their ASMR (Agentic Search and Memory Retrieval) technique, which uses a multi-agent orchestrated pipeline rather than traditional RAG.

    2. Mastra's Observational Memory (February 2026): With gpt-5-mini, Observational Memory scored 94.87% — the highest score ever recorded on this benchmark by any system with any model. With GPT-4o, it achieved 84.23%, outperforming the oracle configuration.

    3. Emergence AI (February 2026): At a comparable latency, Emergence AI achieved 86% accuracy on LongMemEval, compared to other systems' lower scores, using RAG-like methods.

    The benchmark and code are publicly available on GitHub and Hugging Face for researchers to test their memory systems.

    Pricing

    Open-source benchmark, free to use.

    Surveys

    Loading more......

    Information

    Websitegithub.com
    PublishedMar 24, 2026

    Categories

    1 Item
    Benchmarks & Evaluation

    Tags

    3 Items
    #Benchmark#Agent Memory#Evaluation

    Similar Products

    6 result(s)
    MTEB Leaderboard
    Featured

    Massive Text Embedding Benchmark leaderboard covering 58 datasets across 112 languages and 8 embedding tasks. Industry-standard benchmark for comparing text embedding models.

    BEIR Benchmark

    A heterogeneous benchmark for evaluating information retrieval models across 18 diverse datasets and 9 different retrieval tasks. BEIR (Benchmarking IR) measures zero-shot retrieval performance, testing how well models generalize without task-specific fine-tuning, making it a standard evaluation for embedding models and retrieval systems.

    ViDoRe Benchmark

    Visual Document Retrieval benchmark designed to evaluate embedding models and retrieval systems on visually rich documents containing tables, charts, diagrams, and complex layouts. The standard benchmark for assessing multi-modal document understanding and retrieval performance.

    MTEB (Massive Text Embedding Benchmark)

    Comprehensive benchmark suite for evaluating embedding models across 58 datasets spanning 112 languages and eight task types including retrieval, clustering, and semantic similarity, the standard for comparing embedding quality.

    MMTEB

    Massive Multilingual Text Embedding Benchmark covering over 500 quality-controlled evaluation tasks across 250+ languages, representing the largest multilingual collection of embedding model evaluation tasks.

    SISAP Indexing Challenge

    An annual competition focused on similarity search and indexing algorithms, including approximate nearest neighbor methods and high-dimensional vector indexing, providing benchmarks and results relevant to vector database research.

    Decorative pattern
    Built with
    Ever Works
    Ever Works

    Connect with us

    Stay Updated

    Get the latest updates and exclusive content delivered to your inbox.

    Product

    • Categories
    • Tags
    • Pricing
    • Help

    Clients

    • Sign In
    • Register
    • Forgot password?

    Company

    • About Us
    • Admin
    • Sitemap

    Resources

    • Blog
    • Submit
    • API Documentation
    All product names, logos, and brands are the property of their respective owners. All company, product, and service names used in this repository, related repositories, and associated websites are for identification purposes only. The use of these names, logos, and brands does not imply endorsement, affiliation, or sponsorship. This directory may include content generated by artificial intelligence.
    Copyright © 2025 Awesome Vector Databases. All rights reserved.·Terms of Service·Privacy Policy·Cookies